KABI issues

Results 5 issues of


                                            KABI

The version of FewNERD

Hi, @iofu728. It seems the open source dataset “episode-data” is the arxiv version of FewNERD? I found that the reproduced results are very different from those in the paper, maybe...

LLama单机多卡全参数微调：RuntimeError: [3] is setting up NCCL communicator and retrieving ncclUniqueId from [0] via c10d key-value store by key '0', but store->get('0') got error: Connection reset by peer. This may indicate a possible application crash on rank 0 or a network set up issue.

### Reminder - [X] I have read the README and searched the existing issues. ### Reproduction deepspeed --num_gpus=8 --master_port=9901 src/train_bash.py \ --deepspeed /cpfs01/shared/Group-m6/dongguanting.dgt/LLaMA-Factory-main/config/ds_2.json \ --stage sft \ --do_train \ --model_name_or_path...

pending

llama2-13b推理报错

### Reminder 7b推理一直很正常，13b突然爆错了 - [X] I have read the README and searched the existing issues. ### Reproduction template=llama2 path_to_llama_model=my model path datasets=("nq_data") for ((i=0; i> Using auto half precision backend...

Qwen 0.5B inference 报错

### Reminder - [X] I have read the README and searched the existing issues. ### Reproduction ``` template=qwen path_to_llama_model=/cpfs01/shared/Group-m6/dongguanting.dgt/LLaMA_Factory_sft/checkpoint/Qwen1.5-0.5B_hq_dense_train_all datasets=("hq_dense_top_2" "hq_dense_top_3_120") for ((i=0; i

pending

i do not know how to deal with this bug

> (WorkerDict pid=59715) Qwen2ForCausalLM contains 494.03M parameters (WorkerDict pid=59715) Before building vllm rollout, memory allocated (GB): 0.9203834533691406, memory reserved (GB): 2.62890625 (WorkerDict pid=59715) INFO 03-04 17:15:40 config.py:1005] Chunked prefill is...