xlz issues

Results 6 issues of

xlz

[BUG]: Fine-tune Colossal-LLaMA-2 error

### 🐛 Describe the bug I run `colossalai run --nproc_per_node 8 finetune.py \ --plugin "gemini_auto" \ --dataset "/home/pdl/xlz/ColossalAI/data" \ --model_path "/home/pdl/xlz/pretrain_weights/Colossal-LLaMA-2-7b-base" \ --task_name "qaAll_final.jsonl" \ --save_dir "./output" \ --flash_attention \...

bug

sft llama3-8B deepspeed zero3很慢， GPU利用率100%，但是功率很低

### Reminder - [X] I have read the README and searched the existing issues. ### Reproduction bash examples/lora_multi_gpu/ds_zero3.sh ### Expected behavior zero2 正常训练 10个steps 花了6**秒** zero3 非常慢， 10个steps花了10**分钟**， GPU利用率100%，但是功率很低 ![图片](https://github.com/hiyouga/LLaMA-Factory/assets/37104373/80633f9e-c0c3-4624-b2ba-da336a3fe0c6)...

请问如何使用多机多卡推理呢？

### Reminder - [X] I have read the README and searched the existing issues. ### Reproduction 想跨节点使用多卡进行推理 `CUDA_VISIBLE_DEVICES=0,1 python ../../src/cli_demo.py \ --model_name_or_path /app/lcl/model_hub/LLM-Research/Meta-Llama-3-70B-Instruct \ --template default \ --infer_backend vllm` ###...

LLam3-8B预测很慢，比lora微调都慢，如果加上adapter_name_or_path就很快

### Reminder - [X] I have read the README and searched the existing issues. ### Reproduction 执行如下预测命令，来评估llama3-8B 没有加adapter_name_or_path，速度为5秒一个迭代，如果加上则速度有3个迭代一秒 `CUDA_VISIBLE_DEVICES=1 llamafactory-cli train examples/lora_single_gpu/llama3_lora_predict.yaml` ``` # model model_name_or_path: /app/lcl/model_hub/LLM-Research/Meta-Llama-3-8B-Instruct # adapter_name_or_path: saves/llama3-8b/lora/sft...

什么时候进行微调，什么时候使用RAG呢？

我有5万多条专有领域的问答数据用RAG基本都能回答对用Lora微调，全部数据进行训练，达到过拟合状态，模型回答也不是绝对正确（不确定这样做是否正确，如果分训练集、验证集、测试集，模型会出现收敛得不够，或者过拟合，测试集上的效果也不好）请问这种情况是不是不适合微调，还是我微调的方法不正确，一般什么情况使用Lora微调呢？

wontfix

请问为什么选择微调而没有使用大模型+知识库呢

感觉大模型+知识库，在数据更新、解决幻觉问题方面都很有优势请问当初为什么选择微调而不是大模型+知识库呢是不是大模型+知识库有什么问题，需要通过微调来解决呢？