distilabel
distilabel copied to clipboard
Add `Format{Chat,Text}Generation{DPO,SFT}`
Description
This PR adds the following steps in order to format the batches into what the main fine-tuning frameworks / libraries (i.e. axolotl
and alignment-handbook
) expect for both DPO (ORPO too) and SFT.
-
FormatTextGenerationSFT
: when the inputs are single turn i.e. aninstruction
as input, generating ageneration
as output, optionally including asystem_prompt
. Can be directly plugged after aTextGeneration
task and it will work out of the box, no mappings required.It will generate the following output columns:
prompt
,prompt_id
, andmessages
. -
FormatTextGenerationDPO
: when the inputs are single turn i.e. aninstruction
as input, generating a list ofgenerations
as output with either the same or a different LLM, optionally including asystem_prompt
; and then ranked with a preference task likeUltraFeedback
so as to generate theratings
for each generation ingenerations. Will need some
{input,output}_mappingsto work, but should be straight forward combining
TextGenerationtasks with say
UltraFeedback`.It will generate the following output columns:
prompt
,prompt_id
,chosen
,chosen_model
(optional, TBD),chosen_rating
,rejected
,rejected_rating
(optional, TBD), andrejected_rating
. -
FormatChatGenerationSFT
: when the inputs are either single or multi turn conversations, but already formatted with the OpenAI format i.e.messages
contains the conversation as input without the last assistant response, andgeneration
contains the last assistant response. Can be directly plugged after aChatGeneration
task and it will work out of the box, no mappings required.It will generate the following output columns:
prompt
,prompt_id
, andmessages
. -
FormatChatGenerationDPO
: when the inputs are either single or multi turn conversations, but already formatted with the OpenAI format i.e.messages
contains the conversation as input without the last assistant response, generating a list ofgenerations
as output with either the same or a different LLM, optionally including asystem_prompt
; and then ranked with a preference task likeUltraFeedback
so as to generate theratings
for each generation ingenerations. Will need some
{input,output}_mappingsto work, but should be straight forward combining
TextGenerationtasks with say
UltraFeedback`.It will generate the following output columns:
prompt
,prompt_id
,chosen
,chosen_model
(optional, TBD),chosen_rating
,rejected
,rejected_rating
(optional, TBD), andrejected_rating
Closes #513
Hi @alvarobartt, didn't have the time to review thoroughly yet, but could you add some unit tests?
Hi @alvarobartt, didn't have the time to review thoroughly yet, but could you add some unit tests?
Hey @gabrielmbmb sorry if this was not clear (see commit message above) this is still very WIP 😃 Moved it to draft to avoid misunderstandings!
Nice! Let's see how we highlight this in the docs, but looks good to me
Sure, the idea is to include this in the docs once we decide what structure we want to use for the refactor, that's why I didn't write anything but in the PR description 👍🏻