NVIDIA Unveils Latest Models for Synthetic Data Generation with NVIDIA Nemotron-4 340B

  Editorial INTI     2 bulan yang lalu
5332a2c0c92e157afa4c5148da86ffa174b50ef8d109fff57d1f30ea5476d385.jpg

Jakarta, INTI - Since the introduction and widespread adoption of Large Language Models (LLMs), data has been the lifeblood of businesses building accurate and safe AI systems. A company’s data represents its cumulative knowledge and can be leveraged in various ways, from customization (Supervised Fine-Tuning, Parameter Efficient Fine-Tuning, continued pre-training, and more), to training brand-new domain-specific Small Language Models (SLMs). Data, while being one of the most critical pieces of a modern AI pipeline, has traditionally been costly and limiting during the development of innovative LLMs and SLMs – from paying human annotators to navigating the sourcing of large volumes of domain-specific data. The current process of generating high-quality data is a difficult task.

Through a process called Synthetic Data Generation (SDG), businesses can augment their existing data stores by leveraging LLMs to create customized high-quality data in large volumes.

NVIDIA is announcing a new suite of models specifically built for SDG – the Nemotron-4 340B family of models, including a state-of-the-art Reward Model, and an Instruct model to aid in SDG, all released under a permissive license that will enable businesses and developers alike to use the model outputs to build incredible models.

NVIDIA Open Model License

With the release of the Nemotron-4 340B family of models – which includes a Base, Instruct, and Reward Model – NVIDIA is introducing the NVIDIA Open Model License, a permissive license that allows distribution, modification, and use of the Nemotron-4 340B models and its outputs for personal, research, and commercial use, without attribution requirements.

Introducing Nemotron-4 340B Reward Model

Nemotron-4 340B Reward Model is a state-of-the-art multidimensional Reward Model. The model takes a text prompt as input and returns a list of floating point numbers that are associated with the five attributes in the HelpSteer2 dataset, listed below.

The model has been evaluated using Reward Bench and shown to achieve benchmark-topping performance despite only containing 10K human-annotated response pairs.

HelpSteer2 Dataset

With the release of Nemotron-4 340B Reward, we also introduced HelpSteer2. This dataset is permissively licensed (CC-BY-4.0) with ten thousand response pairs. Each prompt in the dataset contains two responses which are human-annotated using a Likert-5 Scale (from 0–4, with higher meaning better) for five attributes:

  • Helpfulness: Overall helpfulness of the response to the prompt.
  • Correctness: Inclusion of all pertinent facts without errors.
  • Coherence: Consistency and clarity of expression.
  • Complexity: Intellectual depth required to write a response (i.e., whether the response can be written by anyone with basic language competency or requires deep domain expertise).
  • Verbosity: Amount of detail included in the response, relative to what is asked for in the prompt.

The dataset is focused on conversational data, including multi-turn conversations in the English language.

SteerLM Reward Model Training

The Nemotron-4 340B Reward Model was trained on the Nemotron-4 340B Base model with an additional linear layer that converts the final layer representation of the end-of-response token into five scalar values, each corresponding to a HelpSteer attribute, referred to as SteerLM Reward Model training.

Synthetic Data Generation

SDG refers to the process of creating datasets that can be used for a variety of model customizations, from Supervised Fine-Tuning (SFT), Parameter Efficient Fine-Tuning (PEFT) including Low Rank Adaptation (LoRA), and model alignment (using methods like RLAIF, DPO, and more). SDG can be applied to a wide range of applications, from retrieval, to evaluation dataset curation, to recommender systems. For this article, we focus on model alignment as the primary use case for the Nemotron-4 340B family of models. Alignment training is a rapidly growing subdiscipline in the Generative AI domain and can be implemented in several different ways.

Synthetic Data Generation Flow

In general terms, SDG is split between two primary pieces, outlined below.

  1. Synthetic Response Generation: Synthetic response data can be generated by giving domain-specific input queries to the Nemotron-4 340B Instruct model. This allows the model to generate responses that are aligned with the input query in a format similar to those used in the Instruction Tuning with GPT-4 paper. These responses can be generated with a zero-shot, few-shot, or chain-of-thought style prompt, depending on the desired response format.
  2. Reward Model Verification: Due to the multi-attribute nature of Nemotron-4 340B Reward, synthetic responses can be ranked by the most desired HelpSteer2 attributes so that only the highest-performing responses are retained. This emulates the process of Human Evaluation of the quality of prompts and adds a layer of quality monitoring in SDG pipelines.

NVIDIA researchers were able to demonstrate the effectiveness of SDG in the HelpSteer2 paper. A total of 100K rows of conversational synthetic data were created through the above pipeline. Using this dataset, the NVIDIA research team was able to align Llama 3 70B (base model) to match or exceed Llama 3 70B Instruct on a number of standard benchmarks. This was achieved despite using only 1% of the human-annotated data that the Llama 3 70B Instruct model was trained with.

Data serves as the backbone of LLMs. Recognizing Synthetic Data Generation as the next frontier of improving Gen AI applications for enterprises, NVIDIA offers the Nemotron-4 340B family of models and SDG pipeline to enable developers and enterprises alike to turbo-charge a wide range of synthetic data use cases, with a permissive license and one of the highest-quality, openly available instruct and reward models.

Ad

Ad