Jakarta, INTI - Since the introduction and widespread adoption of Large Language Models (LLMs), data has been the lifeblood of businesses building accurate and safe AI systems. A company’s data represents its cumulative knowledge and can be leveraged in various ways, from customization (Supervised Fine-Tuning, Parameter Efficient Fine-Tuning, continued pre-training, and more), to training brand-new domain-specific Small Language Models (SLMs). Data, while being one of the most critical pieces of a modern AI pipeline, has traditionally been costly and limiting during the development of innovative LLMs and SLMs – from paying human annotators to navigating the sourcing of large volumes of domain-specific data. The current process of generating high-quality data is a difficult task.
Through a process called Synthetic Data Generation (SDG), businesses can augment their existing data stores by leveraging LLMs to create customized high-quality data in large volumes.
NVIDIA is announcing a new suite of models specifically built for SDG – the Nemotron-4 340B family of models, including a state-of-the-art Reward Model, and an Instruct model to aid in SDG, all released under a permissive license that will enable businesses and developers alike to use the model outputs to build incredible models.
NVIDIA Open Model License
With the release of the Nemotron-4 340B family of models – which includes a Base, Instruct, and Reward Model – NVIDIA is introducing the NVIDIA Open Model License, a permissive license that allows distribution, modification, and use of the Nemotron-4 340B models and its outputs for personal, research, and commercial use, without attribution requirements.
Introducing Nemotron-4 340B Reward Model
Nemotron-4 340B Reward Model is a state-of-the-art multidimensional Reward Model. The model takes a text prompt as input and returns a list of floating point numbers that are associated with the five attributes in the HelpSteer2 dataset, listed below.
The model has been evaluated using Reward Bench and shown to achieve benchmark-topping performance despite only containing 10K human-annotated response pairs.
HelpSteer2 Dataset
With the release of Nemotron-4 340B Reward, we also introduced HelpSteer2. This dataset is permissively licensed (CC-BY-4.0) with ten thousand response pairs. Each prompt in the dataset contains two responses which are human-annotated using a Likert-5 Scale (from 0–4, with higher meaning better) for five attributes:
The dataset is focused on conversational data, including multi-turn conversations in the English language.
SteerLM Reward Model Training
The Nemotron-4 340B Reward Model was trained on the Nemotron-4 340B Base model with an additional linear layer that converts the final layer representation of the end-of-response token into five scalar values, each corresponding to a HelpSteer attribute, referred to as SteerLM Reward Model training.
Synthetic Data Generation
SDG refers to the process of creating datasets that can be used for a variety of model customizations, from Supervised Fine-Tuning (SFT), Parameter Efficient Fine-Tuning (PEFT) including Low Rank Adaptation (LoRA), and model alignment (using methods like RLAIF, DPO, and more). SDG can be applied to a wide range of applications, from retrieval, to evaluation dataset curation, to recommender systems. For this article, we focus on model alignment as the primary use case for the Nemotron-4 340B family of models. Alignment training is a rapidly growing subdiscipline in the Generative AI domain and can be implemented in several different ways.
Synthetic Data Generation Flow
In general terms, SDG is split between two primary pieces, outlined below.
NVIDIA researchers were able to demonstrate the effectiveness of SDG in the HelpSteer2 paper. A total of 100K rows of conversational synthetic data were created through the above pipeline. Using this dataset, the NVIDIA research team was able to align Llama 3 70B (base model) to match or exceed Llama 3 70B Instruct on a number of standard benchmarks. This was achieved despite using only 1% of the human-annotated data that the Llama 3 70B Instruct model was trained with.
Data serves as the backbone of LLMs. Recognizing Synthetic Data Generation as the next frontier of improving Gen AI applications for enterprises, NVIDIA offers the Nemotron-4 340B family of models and SDG pipeline to enable developers and enterprises alike to turbo-charge a wide range of synthetic data use cases, with a permissive license and one of the highest-quality, openly available instruct and reward models.
18 jam yang lalu
19 jam yang lalu
Ad