LongLoRA
LongLoRA copied to clipboard
About the different datasets and corresponding models
Thanks for this great work! I have several questions regarding the datasets and the corresponding models:
Q1: I think you have used RedPajama for FT and LongAlpaca-12k for SFT. You mentioned that there is no need to do FT before SFT. So can I directly use, e.g., llama2-7B-chat-hf, to do SFT using the LongAlpaca-12k dataset?
Q2: If the performance of SFT-only models is already good enough, what's the purpose of doing FT using RedPajama? I mean, FT using RedPajama would be much more time-consuming compared to SFT, right?
Q3: In your paper, I didn't see much results on the evaluations of the SFT-only models. Most evaluations are conducted on the FT models. Would those results on SFT-only models be added to the paper later?
Q4: On 2023.11.19, you released several models fine-tuned on the dataset LongAlpaca-16k-length. What's the difference between LongAlpaca-16k-length and LongAlpaca-12k? Will I get the same model by using LongAlpaca-12k dataset if set the --model_max_length to be 16384?