alignment-handbook
alignment-handbook copied to clipboard
SFT training doesn't fully go through all samples
Current training uses ConstantLengthDataset. This dataset return fixed length of tokens (2048) in every step, however, the total number of steps are calculated based on the number of samples. I checked some samples and found that quite a few of them are much longer than 2048 (~7000), this means that some of the samples have never been seen in one epoch of training.
Could you please verify if my understanding is correct?
Thanks, appreciate.
Hello @hanxiaotian yes there is a small bug in TRL's SFTTrainer with how the training steps are counted and is being fixed here: https://github.com/huggingface/trl/pull/979
Another quick question, after concatenate tokens from different samples seperated by "eos" token, the loss are calculated over the whole sequence without any mask, does my understanding correct? Thanks!
So the fix is merged, but there is no release yet, and when there will be, the requirements should be update to new version of TRL