LongLoRA
LongLoRA copied to clipboard
Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)
### I followed the steps in readme but I encounted the following errors in SFT. [WARNING] async_io requires the dev libaio .so object and headers but these were not found....
Hi, thanks for the great work. I have a question regarding the **used trainset** for different types of models (**Fully fine-tuned, Lora+, models for extra experiments in paper**). In the...
 How did you make answers for a set of two paper comparisons, such as the following example? Was it generated using GPT? Or was it created by a human?
First, I ran the commands as follows: ``` CUDA_VISIBLE_DEVICES=1 torchrun --nproc_per_node=1 --master_port=29501 supervised-fine-tune.py \ --model_name_or_path /mnt/42_store/lhj/data/mllm/model_weights/Llama-2-7b-chat-hf \ --bf16 True \ --output_dir outputs \ --model_max_length 16384 \ --use_flash_attn True \ --data_path...
Some of your uploaded huggingface models lack the parameter `rope_scaling` in the config. If we don't have `rope_scaling`, model will generate `" " " " " "`. `"rope_scaling": {"factor": 2.0,...
 My CUDA version is 11.2, so I can't install Flash Attention on my machine. I try to set use_flash_attn as False when executing fine-tune.py, I meet this error be...
你好,在保存checkpoint的时候会自动保存一个很大的global_step文件,请问这个文件有什么用呢?可以不保存这个文件吗?占用内存太大了。
### Overview - This PR is originated from https://github.com/dvlab-research/LongLoRA/issues/123 - I also faced similar problems with it but no one ever made commits for it... - I added a callback...
Hi, I have a question regarding the results in Table 8 and Table 14 (05 Dec., 2023 version). In Table 8, for 7B context length 8192, the ppl for full...
Thanks for this great work! I have several questions regarding the datasets and the corresponding models: Q1: I think you have used RedPajama for FT and LongAlpaca-12k for SFT. You...