牛宇霖 issues

Results 5 issues of


                                            牛宇霖

[BUG] When using Zero-Infinity, Assertion `n_completes >= min_completes' failed

**Describe the bug** I can use my script to finetune model with zero 2 and 3. However, when I use zero infinity offloading parameters. the error occurs: python: /opt/conda/lib/python3.10/site-packages/deepspeed/ops/csrc/aio/common/deepspeed_aio_common.cpp:125: int...

bug

training

DPO训练Lora后，模型的生成结果是乱码

### Reminder - [X] I have read the README and searched the existing issues. ### Reproduction 您好，我通过SFT训练了一个能够正常使用的lora模型。现在想进一步通过DPO阶段的训练来优化lora模型的效果。但是我通过以下脚本训练后，输出的结果是乱码（随机重复的数字或字符串）。数据集我反复检查了是没有问题的。请问我是哪里出错了呢？另外，我的目的是继续训练Lora，训练的输出希望是优化后的lora模型。这个参数adapter_name_or_path 我看介绍说的是path to sft checkpoint. 那这里我应该放的是lora模型，还是将lora和base合并后的模型呢？非常感谢！ > CUDA_VISIBLE_DEVICES=0 deepspeed --num_gpus=1 /root/LLaMA-Factory/src/train_bash.py \...

pending

[Question]: 请问用paddlenlp微调llama2后如何将模型转换回torch的权重呢？

### 请提出你的问题请问用paddlenlp微调llama2后如何将模型转换回torch的权重呢？

question

stale

Can not install llama.cpp with cuBlas using the latest code.

I ran the code "CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python" on Kaggle 2xT4 envrionment. It worked before. but today. When I ran the same code. below error occured. Could you please tell...

bug-unconfirmed

stale

checkpoint's size is increasing everytime.

**Describe the bug** Hi, when I'm finetuning gemma. the checkpoint size was a fixed value at the begining. Then it became bigger and bigger. Finally, when it reached 5.99GB, it...