deita
deita copied to clipboard
The length of samples
It seems each sample in the deita dataset consists of a lot of turns and is super long (>10k tokens). Your paper mentioned the max length of input is 2048 for SFT. Does that mean most text of each training sample is truncated and discarded?