Lion icon indicating copy to clipboard operation
Lion copied to clipboard

Issues about Applying Delta

Open authurlord opened this issue 1 year ago • 1 comments

Thanks for your great work. However, when checking your huggingface page https://huggingface.co/YuxinJiang/Lion, I found your delta weights size is 25G, which is typically the size of LLAMA-13B model(same as vicuna-13b-delta-v1.1), and typically a 7B delta weights have the size of 12-13G(same as vicuna-7b-delta-v1.1), and I run error in checking integrity applying src/weight_diff.py, when merging your delta weights with LLAMA-7b base model.

Besides, what is the minimal usage for reproducing your model? Is it capable for consumer hardware, e.g. 43090/4V100, or the minimal usage is 8*A100? and do you have the plan for supporting Accelerate/DeepSpeed for offloading/quantization training? And your full training pipeline doesn't support low-resource fine-tuning technique, e.g., lora/pTuning?

Many Thanks for your reply!

authurlord avatar May 26 '23 16:05 authurlord

Hi, thanks for your interest in our work. The delta weights size is 25G since we use float32 as torch_dtype, and vicuna-7b-delta-v1.1 uses float16. We have changed to float16 and reloaded the delta weights to huggingface https://huggingface.co/YuxinJiang/Lion, please check again.

Besides, the checking integrity is copied from Stanford-alpaca and is not applicable to our case. We have modified the code, now you can merge our delta weights with LLAMA-7b base model successfully.

Currently, the minimal usage for fine-tuning our model is at least 8 × 48GB GPUs for ~7 hours. Our framework also supports Accelerate/DeepSpeed as well as lora/pTuning. We will update the related instructions later.

Thanks for your valuable feedback!

YJiangcm avatar May 27 '23 06:05 YJiangcm