distilabel Add `Format{Chat,Text}Generation{DPO,SFT}`

Description

This PR adds the following steps in order to format the batches into what the main fine-tuning frameworks / libraries (i.e. axolotl and alignment-handbook) expect for both DPO (ORPO too) and SFT.

FormatTextGenerationSFT: when the inputs are single turn i.e. an instruction as input, generating a generation as output, optionally including a system_prompt. Can be directly plugged after a TextGeneration task and it will work out of the box, no mappings required.

It will generate the following output columns: prompt, prompt_id, and messages.
FormatTextGenerationDPO: when the inputs are single turn i.e. an instruction as input, generating a list of generations as output with either the same or a different LLM, optionally including a system_prompt; and then ranked with a preference task like UltraFeedback so as to generate the ratings for each generation in generations. Will need some {input,output}_mappingsto work, but should be straight forward combiningTextGenerationtasks with sayUltraFeedback`.

It will generate the following output columns: prompt, prompt_id, chosen, chosen_model (optional, TBD), chosen_rating, rejected, rejected_rating (optional, TBD), and rejected_rating.
FormatChatGenerationSFT: when the inputs are either single or multi turn conversations, but already formatted with the OpenAI format i.e. messages contains the conversation as input without the last assistant response, and generation contains the last assistant response. Can be directly plugged after a ChatGeneration task and it will work out of the box, no mappings required.

It will generate the following output columns: prompt, prompt_id, and messages.
FormatChatGenerationDPO: when the inputs are either single or multi turn conversations, but already formatted with the OpenAI format i.e. messages contains the conversation as input without the last assistant response, generating a list of generations as output with either the same or a different LLM, optionally including a system_prompt; and then ranked with a preference task like UltraFeedback so as to generate the ratings for each generation in generations. Will need some {input,output}_mappingsto work, but should be straight forward combiningTextGenerationtasks with sayUltraFeedback`.

It will generate the following output columns: prompt, prompt_id, chosen, chosen_model (optional, TBD), chosen_rating, rejected, rejected_rating (optional, TBD), and rejected_rating

Closes #513

Apr 25 '24 11:04 alvarobartt

Hi @alvarobartt, didn't have the time to review thoroughly yet, but could you add some unit tests?

Apr 26 '24 10:04 gabrielmbmb

Hi @alvarobartt, didn't have the time to review thoroughly yet, but could you add some unit tests?

Hey @gabrielmbmb sorry if this was not clear (see commit message above) this is still very WIP 😃 Moved it to draft to avoid misunderstandings!

Apr 26 '24 10:04 alvarobartt

Nice! Let's see how we highlight this in the docs, but looks good to me

Sure, the idea is to include this in the docs once we decide what structure we want to use for the refactor, that's why I didn't write anything but in the PR description 👍🏻

Apr 30 '24 14:04 alvarobartt

distilabel distilabel copied to clipboard

Add `Format{Chat,Text}Generation{DPO,SFT}`

Description

distilabel
distilabel copied to clipboard