Add back support for longest sequence first
@awaelchli Semi related to this PR. I just noticed that we don't have the code to run the longest sample at the beginning of training anymore: https://github.com/Lightning-AI/litgpt/blob/globals/finetune/lora.py#L268-L270 Should we add it back? It's useful to OOM as soon as possible. If not, let's drop the
longest_seq_ixvariable entirely
I'm fine returning the longest element first if that's possible to implement in the SFTDataset. I would also move the responsibility of selecting the longest sample to the datamodule/dataset so that this logic doesn't have to be in the script, and we expose a simple method/attribute for the longest_seq_length. This way, we can precompute it during the loading of the dataset and don't have to iterate through the whole dataset again.
From https://github.com/Lightning-AI/litgpt/pull/1179#discussion_r1538392383