Sr Language Data Scientist Search Specialization

Innodata
Ridgefield Park, NJ

Job description

Who we are:

Innodata (NASDAQ: INOD) is a leading data engineering company. With more than 2,000 customers and operations in 13 cities around the world, we are the AI technology solutions provider-of-choice to 4 out of 5 of the world’s biggest technology companies, as well as leading companies across financial services, insurance, technology, law, and medicine.

By combining advanced machine learning and artificial intelligence (ML/AI) technologies, a global workforce of subject matter experts, and a high-security infrastructure, we’re helping usher in the promise of clean and optimized digital data to all industries. Innodata offers a powerful combination of both digital data solutions and easy-to-use, high-quality platforms.

Our global workforce includes over 3,000 employees in the United States, Canada, United Kingdom, the Philippines, India, Sri Lanka, Israel and Germany. We’re poised for a period of explosive growth over the next few years.

Position Summary:

Who We’re Looking For:

You have at least 5 years of relevant experience with data creation, curation, and analysis for search and information retrieval systems, including work with GenAI applications (e.g. neural ranking, semantic search, query understanding, RAG-enhanced search, multi-stage ranking pipelines). Your experience spans creating and annotating search datasets — from query-document pairs to relevance judgments, and query intent classifications. You have demonstrated success working on search product challenges such as relevance optimization, query intent understanding, or improving search result diversity and freshness. You understand the unique data annotation challenges in search (inter-rater disagreement on relevance, context-dependent query understanding, geographic and temporal relevance).

You are experienced driving long term projects where you set the strategic plan towards success, using your knowledge of AI, data science, and process design excellence. You are an expert at working cross functionally with both technical and non-technical stakeholders. Despite ambiguity, you use your technical knowledge and experience of working with multiple stake holder to drive solutions.

You bring a research-oriented mindset towards developing long-term excellence in search systems. You are an expert in designing collection, evaluation and quality assurance processes for search data, using human-in-the-loop and synthetic techniques. You understand search-

specific evaluation metrics and quality frameworks, and you can design human relevance judging workflows that account for query ambiguity and subtlety.

Your understanding of machine learning, Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG), neural ranking architectures, and dense retrieval methods help you tackle search and information retrieval challenges with a critical, innovative mindset. You can assess how GenAI techniques improve search relevance, ranking, and user experience.

Tell Me More:

As a Senior Language Data Scientist, you lead projects and own processes for optimizing search and retrieval systems by creating, validating and annotating search-specific data for LLM/ML applications. This includes query-document pairs, relevance judgments, query intent labels, search result quality assessments, and multimodal search scenarios (image search, product search, news search). You work across different search domains—from web search to e-commerce to vertical search. You consult and engage with customers to understand their business goals and design processes to meet them. You generate insights about the client’s processes and products to drive improvement and innovation. You advise and support business unit heads on engaging with customers to understand the upstream activities that would be performed using Innodata Inc services.

Responsibilities:

  • You can lead long-term projects with high complexity and ambiguity from first discussion with the client to completion

  • Design/improve workflows to create data for AI/ML training and evaluation. Includes human annotation and data-collection workflows, as well as synthetic ones

  • Design and refine search data annotation frameworks, including relevance judging guidelines that handle nuanced query-document relationships, query ambiguity, and domain-specific search challenges (e.g., freshness for news search, user intent for product search)

  • Dive deep into existing workflows and processes to gather data and insights, make recommendations, and drive improvement through innovation and cross-functional collaboration with customers

  • Assess and optimize search-specific evaluation approaches, including A/B testing frameworks, ranking metrics, and human evaluation studies for search result quality

  • Critically assess annotation tooling and workflows

  • Quantitatively analyze large datasets, perform statistical analysis, calculate metrics, and make recommendations to improve accuracy and performance

  • Work closely with client stakeholders on understanding goals, gathering requirements, proposing solutions, and executing them.

  • Set an ambitious research agenda for improving our products and services

  • Contribute to establishing best practices and standards for generative AI development with customers and within the organization

Job requirements

  • MA in (computational) linguistics, data science, computer science (AI / ML / NLU), quantitative social sciences or a related scientific / quantitative field, PhD strongly preferred

  • Ability to collaborate directly with technical stakeholders including senior project managers, data engineers, and research scientists.

  • Collaborating with cross-functional teams to define AI project requirements and objectives, ensuring alignment with overall business goals

  • Design efficient data strategies for complex long-term projects, potentially involving multiple teams and workflows.

  • Knowledge of how components of GenAI products or services combine to work

  • Developing clear and concise documentation, including technical specifications, user guides, and presentations, to communicate complex AI concepts to both technical and nontechnical stakeholders

  • Familiarity with GenAI technologies that enables you to improve existing processes to handle future challenges.

  • Search and Language Data Expertise: Extensive experience working with search-specific language data (queries, documents, relevance judgments, intent labels) and designing human evaluation tasks, including multi-phase and complex workflows. You have hands-on experience with query annotation frameworks and understand the semantic relationship between queries and documents.

  • Quantitative Analysis Skills: Advanced knowledge of statistics, metrics (e.g. f1 score, inter-rater reliability metrics), and data analysis methods such as sampling.

Technical skills:

  • Experience with Natural Language Processing (NLP) techniques and tools, such as SpaCy, NLTK, or Hugging Face. o Proficiency in Python to

    • handle / transform large datasets (e.g. pre- and postprocessing data, pandas)

    • perform quantitative analyses

    • visualize data (for example matplotlib, seaborn)

Data processing:

  • Deep understanding of data pipelines to support ML and NLP workflows

  • Knowledge of efficient data collection, transformation, and storage

  • Knowledge of data structures, algorithms, and data engineering principles

  • Excellent interpersonal skills for effective cross-functional stakeholder engagement

  • Excellent problem-solving skills, with the ability to think critically and creatively to develop innovative AI solutions

  • Ability to work independently and collaborate as part of a team

  • Adaptable to changing technologies and methodologies

  • Ability to translate experience, research and development information to understand client products and services.

  • Providing technical mentorship and guidance to junior team members

Preferred Skills

  • Conducting research to stay up-to-date with the latest advancements in generative AI, machine learning, and deep learning techniques · Knowledge of optimizing existing generative AI models for improved performance, scalability, and efficiency

  • Experience of developing and maintaining ML/AI pipelines, including data preprocessing, feature extraction, model training, and evaluation · Model Fine-Tuning: Knowledge of Fine-tuning pre-trained models to adapt them to specific tasks and datasets, improving their performance and relevance

  • Understanding of techniques such as GPT, VAE, and GANs

Please be aware of recruitment scams involving individuals or organizations falsely claiming to represent employers. Innodata will never ask for payment, banking details, or sensitive personal information during the application process. To learn more on how to recognize job scams, please visit the Federal Trade Commission’s guide at

If you believe you’ve been targeted by a recruitment scam, please report it to Innodata at [email protected] and consider reporting it to the FTC at ReportFraud.ftc.gov .

#LI-NS1

All done!

Your application has been successfully submitted!

Posted 2026-02-28

Recommended Jobs

Brand Ambassador

Sandpiper Productions
Cream Ridge, NJ

About us Join our team of professionals and apply for our elite brand ambassador job in New Jersey and be part of something great! Starting pay $30.00/hour. Female-owned and known fo…

View Details
Posted 2025-11-12

Member Services Representative Part Time

Planet Fitness
Cherry Hill, NJ

Job Summary The Member Services Representative will be responsible for creating a positive member experience by providing a superior level of customer service to Planet Fitness members, prospe…

View Details
Posted 2025-12-29

Per Diem Custodian

Catholic Charities of the Archdiocese of Newark
Newark, NJ

Catholic Charities of the Archdiocese of Newark is currently seeking a  full time Custodian for its  WFD Cleaning Services program located in  Newark, NJ.   The Custodian will be responsible for c…

View Details
Posted 2026-02-25

Sales Partner (Commission-Only) - United States

Avant Tech
New Brunswick, NJ

Job Title Sales Partner (Commission-Only) – United States Location United States (Remote / Field-Based) Job Type Independent Contractor · Commission-Only · Non-Exclusive About …

View Details
Posted 2026-02-11

Residential Counselor - Community Support Services (CSS)

SERV Behavioral Health System
Clifton, NJ

JOB SUMMARY: The counselor provides training and support in skills development, medication management, work readiness activities, case management and service coordination to clients experiencing…

View Details
Posted 2026-02-11

Entry-Level Freight Dispatcher - $1,200-$2,500 Weekly

American Logistics Authority
Toms River, NJ

ntry-Level Freight Dispatcher – $1,200–$2,500 Weekly We are seeking reliable and organized individuals for an Entry-Level Freight Dispatcher opportunity. This is an independent contractor role …

View Details
Posted 2025-12-01

Military Police

U.S. Army
Mays Landing, NJ

Military Police   As a Military Police, you’ll protect peoples’ lives and property on Army installations by enforcing military laws and regulations. You’ll also control traffic, prevent crime, and r…

View Details
Posted 2025-08-12

General Helper 2nd Shift

Thorlabs
Newton, NJ

Purpose of the Position This position works among a team of individuals manufacturing products for our customers under the guidance of experienced team members. The role entails such function…

View Details
Posted 2025-03-21

2nd Shift Manager - Janitorial and Building Maintenance Services

City Wide Facility Solutions
East Brunswick, NJ

City Wide Facility Solutions is excited to welcome a full-time 2nd Shift Night Manager to our vibrant Central New Jersey office! As the nation's top management company in the building maintenance i…

View Details
Posted 2026-02-17

High School Science Position

BelovED Community & Empowerment Academy Charter Schools
Jersey City, NJ

Be Part of Our Growing High-Performance School We have achieved a high level of academic success without burning out our teachers by meshing an effective education program with effective teacher s…

View Details
Posted 2026-01-14