Lion
Lion copied to clipboard
Issues about Applying Delta
Thanks for your great work. However, when checking your huggingface page https://huggingface.co/YuxinJiang/Lion, I found your delta weights size is 25G, which is typically the size of LLAMA-13B model(same as vicuna-13b-delta-v1.1), and typically a 7B delta weights have the size of 12-13G(same as vicuna-7b-delta-v1.1), and I run error in checking integrity applying src/weight_diff.py, when merging your delta weights with LLAMA-7b base model.
Besides, what is the minimal usage for reproducing your model? Is it capable for consumer hardware, e.g. 43090/4V100, or the minimal usage is 8*A100? and do you have the plan for supporting Accelerate/DeepSpeed for offloading/quantization training? And your full training pipeline doesn't support low-resource fine-tuning technique, e.g., lora/pTuning?
Many Thanks for your reply!
Hi, thanks for your interest in our work. The delta weights size is 25G since we use float32 as torch_dtype, and vicuna-7b-delta-v1.1 uses float16. We have changed to float16 and reloaded the delta weights to huggingface https://huggingface.co/YuxinJiang/Lion, please check again.
Besides, the checking integrity
is copied from Stanford-alpaca and is not applicable to our case. We have modified the code, now you can merge our delta weights with LLAMA-7b base model successfully.
Currently, the minimal usage for fine-tuning our model is at least 8 × 48GB GPUs for ~7 hours. Our framework also supports Accelerate/DeepSpeed as well as lora/pTuning. We will update the related instructions later.
Thanks for your valuable feedback!