LongLoRA icon indicating copy to clipboard operation
LongLoRA copied to clipboard

Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)

Results 52 LongLoRA issues
Sort by recently updated
recently updated
newest added

### I followed the steps in readme but I encounted the following errors in SFT. [WARNING] async_io requires the dev libaio .so object and headers but these were not found....

Hi, thanks for the great work. I have a question regarding the **used trainset** for different types of models (**Fully fine-tuned, Lora+, models for extra experiments in paper**). In the...

![image](https://github.com/dvlab-research/LongLoRA/assets/55049714/132725db-b63a-42ac-af95-f3c60caacde3) How did you make answers for a set of two paper comparisons, such as the following example? Was it generated using GPT? Or was it created by a human?

First, I ran the commands as follows: ``` CUDA_VISIBLE_DEVICES=1 torchrun --nproc_per_node=1 --master_port=29501 supervised-fine-tune.py \ --model_name_or_path /mnt/42_store/lhj/data/mllm/model_weights/Llama-2-7b-chat-hf \ --bf16 True \ --output_dir outputs \ --model_max_length 16384 \ --use_flash_attn True \ --data_path...

Some of your uploaded huggingface models lack the parameter `rope_scaling` in the config. If we don't have `rope_scaling`, model will generate `" " " " " "`. `"rope_scaling": {"factor": 2.0,...

![image](https://github.com/dvlab-research/LongLoRA/assets/147307433/a149481e-9bc5-4389-9058-d5e0dae83aef) My CUDA version is 11.2, so I can't install Flash Attention on my machine. I try to set use_flash_attn as False when executing fine-tune.py, I meet this error be...

你好,在保存checkpoint的时候会自动保存一个很大的global_step文件,请问这个文件有什么用呢?可以不保存这个文件吗?占用内存太大了。

### Overview - This PR is originated from https://github.com/dvlab-research/LongLoRA/issues/123 - I also faced similar problems with it but no one ever made commits for it... - I added a callback...

Hi, I have a question regarding the results in Table 8 and Table 14 (05 Dec., 2023 version). In Table 8, for 7B context length 8192, the ppl for full...

Thanks for this great work! I have several questions regarding the datasets and the corresponding models: Q1: I think you have used RedPajama for FT and LongAlpaca-12k for SFT. You...