long_llama
long_llama copied to clipboard
FoT can only be used for pre-training, can't it be used for instruction fine-tuning?
I don’t know much about how cross-batch data is loaded during training.