Language Data Scientist

Innodata
Ridgefield Park, NJ

Job description

Who we are:

Innodata (NASDAQ: INOD) is a leading data engineering company. With more than 2,000 customers and operations in 13 cities around the world, we are the AI technology solutions provider-of-choice to 4 out of 5 of the world’s biggest technology companies, as well as leading companies across financial services, insurance, technology, law, and medicine.

By combining advanced machine learning and artificial intelligence (ML/AI) technologies, a global workforce of subject matter experts, and a high-security infrastructure, we’re helping usher in the promise of clean and optimized digital data to all industries. Innodata offers a powerful combination of both digital data solutions and easy-to-use, high-quality platforms.

Our global workforce includes over 3,000 employees in the United States, Canada, United Kingdom, the Philippines, India, Sri Lanka, Israel and Germany. We’re poised for a period of explosive growth over the next few years.

Position Summary:

Innodata is building a team of Language Data Scientists and Gen AI experts to help our customers advance GenAI applications. You will work hands-on with multi-modal and multi-lingual datasets and collaborate with cross-functional partners. You will use your experience with human and synthetic data workflows to drive innovation and continuous improvement. The ideal candidate must have the right mix of skills in (computational) linguistics and human evaluation tasks, data science, and data engineering.

Who We’re Looking For:

You have at least 3 years of relevant experience with data creation, curation and analysis for GenAI applications (e.g. RAG, Agents, complex reasoning). You are an expert in designing collection, evaluation and quality assurance processes, using human-in-the-loop and synthetic techniques. You bring a wealth of expertise in language, culture, and multi-lingual projects. You are experienced in analyzing data with advanced statistical tools and driving success through process excellence.

Your understanding of machine learning, Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG) help you tackle challenges with a critical, innovative mindset. You’re also a strong communicator, excelling in cross-functional collaboration and understanding business needs.

Tell Me More:

As a Language Data Scientist, you create and own processes for creating, validating and annotating data for use in LLM/ML applications. This can be natural language data or multimodal data including images, video, audio and others. You consult and engage with customers to understand their business goals and design processes to meet them. You generate insights about the client’s processes and products to drive improvement and innovation. You advise and support business unit heads on engaging with customers to understand the upstream activities that would be performed using Innodata Inc services.

Responsibilities:

  • Design/improve workflows to create data for AI/ML training and evaluation. Includes human annotation and data collection workflows, as well as synthetic ones.

  • Dive deep into existing workflows and processes to gather data and insights, make recommendations, and drive improvement through innovation and cross-functional collaboration with customers

  • Critically assess annotation tooling and workflows

  • Quantitatively analyze large datasets, perform statistical analysis, calculate metrics, and make recommendations to improve accuracy and performance

  • Work closely with client stakeholders on understanding goals, gathering requirements, proposing solutions and executing them.

Job requirements

  • Knowledge of how components of GenAI products or services combine to work

  • Collaborating with cross-functional teams to define AI project requirements and objectives, ensuring alignment with overall business goals

  • MA in (computational) linguistics, data science, computer science (AI / ML / NLU), quantitative social sciences or a related scientific / quantitative field, PhD strongly preferred

  • Language and language data expertise: Extensive experience working with human language data and designing human evaluation tasks, including multi-phase and complex workflows.

    • Deep understanding of language and its relationship with culture

    • Ability to identify ambiguity and subjectivity in language

    • Ability to work with multi-lingual and multi-modal projects

  • Quantitative Analysis Skills: Advanced knowledge of statistics, metrics (e.g. f1 score, inter-rater reliability metrics), and data analysis methods such as sampling.

  • Technical skills:

    • Experience with Natural Language Processing (NLP) techniques and tools, such as SpaCy, NLTK, or Hugging Face.

    • Proficiency in Python to:

      • handle / transform large datasets (e.g. pre- and postprocessing data, pandas)

      • perform quantitative analyses

      • visualize data (for example matplotlib, seaborn)

  • Data processing:

    • Deep understanding of data pipelines to support ML and NLP workflows,

    • Knowledge of efficient data collection, transformation, and storage

    • Knowledge of data structures, algorithms, and data engineering principles

  • Excellent interpersonal skills for effective cross-functional stakeholder engagement

  • Excellent problem-solving skills, with the ability to think critically and creatively to develop innovative AI solutions

  • Ability to work independently and collaborate as part of a team

  • Adaptable to changing technologies and methodologies

  • Ability to translate experience, research and development information to understand client products and services.

Preferred Skills

  • Conducting research to stay up-to-date with the latest advancements in generative AI, machine learning, and deep learning techniques

  • Knowledge of optimizing existing generative AI models for improved performance, scalability, and efficiency

  • Experience of developing and maintaining ML/AI pipelines, including data preprocessing, feature extraction, model training, and evaluation

  • Model Fine-Tuning: Knowledge of Fine-tuning pre-trained models to adapt them to specific tasks and datasets, improving their performance and relevance

  • Developing clear and concise documentation, including technical specifications, user guides, and presentations, to communicate complex AI concepts to both technical and nontechnical stakeholders

  • Contributing to establishing best practices and standards for generative AI development with customers and within the organization

  • Providing technical mentorship and guidance to junior team members

  • Understanding of techniques such as GPT, VAE, and GANs

Please be aware of recruitment scams involving individuals or organizations falsely claiming to represent employers. Innodata will never ask for payment, banking details, or sensitive personal information during the application process. To learn more on how to recognize job scams, please visit the Federal Trade Commission’s guide at

If you believe you’ve been targeted by a recruitment scam, please report it to Innodata at [email protected] and consider reporting it to the FTC at ReportFraud.ftc.gov .

#LI-NS1

All done!

Your application has been successfully submitted!

Posted 2026-02-28

Recommended Jobs

Physician-Internal Medicine

JCSI
Lakewood, NJ

Internal Medicine Physician – Federally Qualified Health Center (FQHC) | Lakewood, NJ Compensation: $230,000 starting salary + productivity bonus A mission-driven Federally Qualified Health Ce…

View Details
Posted 2026-01-07

Digital Account Executive

Beasley Media Group
Belmar, NJ

Core Responsibility: The Digital Account Executive position puts you face to face with local business owners and advertising agency representatives, from cold calling to closing the sale.  You’ll b…

View Details
Posted 2026-02-11

IT Support Specialist

Fullpath
Teaneck, NJ

Fullpath is an AI-first   tech company in the automotive space with hubs across the US and Israel. Our mission is to constantly disrupt the industry by creating new, groundbreaking technologies to he…

View Details
Posted 2026-02-27

Principal Scientist, Translational PET Imaging

Merck & Co.
Rahway, NJ

Job Description We are seeking a motivated and experienced Translational PET Imaging Scientist to lead the transition of novel PET radiotracers from late preclinical development into first-in-huma…

View Details
Posted 2025-12-19

Mid - Senior Level Labor & Employment Associate

Newark, NJ

New Labor & Employment Associate role - global firm. Opportunity to join a Top 50 AM Law firm's award winning group. About Our Client Well regarded, international, top AM Law 50 firm Known f…

View Details
Posted 2026-01-09

AI Engineer (Pharma)

SGS Consulting
New Jersey

Job Responsibilities: Design of AI agents and architecture capable of autonomous decision making Perform feature engineering and development of accurate ML models Familiarity with statistics…

View Details
Posted 2025-11-14

Senior Systems Administrator

Five Rivers IT, Inc.
Fair Lawn, NJ

Senior System Admin A leading provider of IT Managed Services in Northern New Jersey is looking for a Systems Admin to provide service to its clients in the NYC metro area. This is a great opportu…

View Details
Posted 2025-12-29

Cake Decorator

Paris Baguette - Haddonfield Cherry Hill
Cherry Hill, NJ

Reports to: General Manager/ Kitchen Manager With a projected 1,000 cafes in the United States by 2030, Paris Baguette is one of the fastest-growing neighborhood bakery cafés in the world. Our vis…

View Details
Posted 2026-01-31

urgent - Sales Executives in Freight Forwarding (Asia/US lane)

Amrecco
Union City, NJ

A growing international freight forwarding company is expanding its sales team and seeking qualified candidates. They are looking for candidates with relevant Sales experience in international freigh…

View Details
Posted 2026-01-09

Industrial Lift Truck Service Technician (LPG & Diesel)

OEG Building Materials
Sayreville, NJ

Join OEG Building Materials as an Industrial Lift Truck Service Technician OEG Building Materials, a steel manufacturing company, is looking for a hands-on technician who knows propane and diesel …

View Details
Posted 2026-02-20