LongLoRA icon indicating copy to clipboard operation
LongLoRA copied to clipboard

About the different datasets and corresponding models

Open Statisticss opened this issue 1 year ago • 0 comments

Thanks for this great work! I have several questions regarding the datasets and the corresponding models:

Q1: I think you have used RedPajama for FT and LongAlpaca-12k for SFT. You mentioned that there is no need to do FT before SFT. So can I directly use, e.g., llama2-7B-chat-hf, to do SFT using the LongAlpaca-12k dataset?

Q2: If the performance of SFT-only models is already good enough, what's the purpose of doing FT using RedPajama? I mean, FT using RedPajama would be much more time-consuming compared to SFT, right?

Q3: In your paper, I didn't see much results on the evaluations of the SFT-only models. Most evaluations are conducted on the FT models. Would those results on SFT-only models be added to the paper later?

Q4: On 2023.11.19, you released several models fine-tuned on the dataset LongAlpaca-16k-length. What's the difference between LongAlpaca-16k-length and LongAlpaca-12k? Will I get the same model by using LongAlpaca-12k dataset if set the --model_max_length to be 16384?

Statisticss avatar Feb 02 '24 08:02 Statisticss