DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

Remove the fixed `eot_token` mechanism for SFT

Open Xingfu-Yi opened this issue 1 year ago • 1 comments

Background

Not all pretrained LLMs use <|endoftext|> as the eot_token, therefore it's inappropriate to fix it.

Changes

  • Removed the hardcoded eot_token: args.end_of_conversation_token = "<|endoftext|>".
  • Added a new argument in the parser called eot_token which is <|endoftext|> by default. Users can manually set the token according to the pretrained model they use.

Xingfu-Yi avatar Sep 15 '24 03:09 Xingfu-Yi

Hi @arashb, @duli2012, @awan-10, @eltonzheng,

I hope you're doing well. When you have a moment, could you kindly take a look at this PR? It has already received one approval, but it seems to be stuck and needs further reviews to move forward.

Thank you so much in advance for your time and help.

Best regards,
Yi

Xingfu-Yi avatar Sep 24 '24 12:09 Xingfu-Yi

Hi @arashb, @duli2012, @awan-10, @eltonzheng,

I hope you're doing well. When you have a moment, could you kindly take a look at this PR? It has already received one approval, but it seems to be stuck and needs further reviews to move forward.

Thank you so much in advance for your time and help.

Best regards, Yi

Hi @Xingfu-Yi - we will work on getting this PR merged, sorry for the delay.

loadams avatar Oct 29 '24 22:10 loadams