alignment-handbook icon indicating copy to clipboard operation
alignment-handbook copied to clipboard

SFT training doesn't fully go through all samples

Open hanxiaotian opened this issue 2 years ago • 3 comments

Current training uses ConstantLengthDataset. This dataset return fixed length of tokens (2048) in every step, however, the total number of steps are calculated based on the number of samples. I checked some samples and found that quite a few of them are much longer than 2048 (~7000), this means that some of the samples have never been seen in one epoch of training.

Could you please verify if my understanding is correct?

Thanks, appreciate.

hanxiaotian avatar Dec 03 '23 21:12 hanxiaotian

Hello @hanxiaotian yes there is a small bug in TRL's SFTTrainer with how the training steps are counted and is being fixed here: https://github.com/huggingface/trl/pull/979

lewtun avatar Dec 04 '23 08:12 lewtun

Another quick question, after concatenate tokens from different samples seperated by "eos" token, the loss are calculated over the whole sequence without any mask, does my understanding correct? Thanks!

hanxiaotian avatar Dec 05 '23 00:12 hanxiaotian

So the fix is merged, but there is no release yet, and when there will be, the requirements should be update to new version of TRL

Randl avatar Dec 08 '23 11:12 Randl