CPM-Bee issues

关于 bmtrain 包的版本问题

9

环境： ubuntu server 22.04 conda python 3.10.0 nvidia driver 12.1.0 尝试一： `pip install -r requirements.txt` 报错： ``` Collecting torch=1.10 Using cached torch-1.13.1-cp310-cp310-manylinux1_x86_64.whl (887.5 MB) Collecting bmtrain>=0.2.1 Using cached bmtrain-0.2.2.tar.gz (58...

Lufffya

Environment

单卡微调，没有输出微调模型

6

微调命令：torchrun --nnodes=1 --nproc_per_node=1 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost:12345 finetune_cpm_bee.py --use-delta --model-config config/cpm-bee-10b.json --dataset ../tutorials/basic_task_finetune/bin_data/train --eval_dataset ../tutorials/basic_task_finetune/bin_data/eval --epoch 5 --batch-size 5 --train-iters 100 --save-name cpm_bee_finetune --max-length 32 --save results/ --lr 0.0001 --inspect-iters 100...

ivancr7

BMTrain不好安装能出一个具体的环境要求吗 ?

8

尝试了很多版本也不知道哪里出问题了反正就是安装不上有的时候提示torch没有有的时候有提示gcc 重新安装了数次我的是在dock环境里面折腾了2天了环境还没有配好万分感谢啦

xiaoguaishoubaobao

Environment

finetue_cpm_bee.py 当前支持模型并行训练吗，传参应该怎么设置呢？

当前运行机器有4张3090卡，但是通过指令运行增量微调的时候，报错； ``` export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:32 torchrun --nnodes=1 --nproc_per_node=4 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost:12345 finetune_cpm_bee.py --use-delta --model-config /home/CPM-Bee/src/config/cpm-bee-10b.json xxxxx ``` ``` OutOfMemoryError: CUDA out of memory. Tried to allocate 80.00 MiB (GPU 0; 23.70...

ryzn0518

商用授权邮件

您好，问下商用授权多久能收到回复，邮件发了好几天了，谢谢~

eatingbread

请问 Linux下安装requirements.txt 报错是什么问题导致的呢 windows也是同样的错误

1

下面是报错信息 ![image](https://github.com/OpenBMB/CPM-Bee/assets/21216881/ccbb767c-8728-4887-aae2-037d9b7ca126) **** `Collecting torch=1.10 Downloading torch-1.13.1-cp39-cp39-manylinux1_x86_64.whl (887.4 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 887.4/887.4 MB ? eta 0:00:00 Collecting bmtrain>=0.2.1 Downloading bmtrain-0.2.2.tar.gz (58 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.7/58.7 kB 28.4 kB/s eta 0:00:00 Preparing metadata...

xiaoguaishoubaobao

torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

微调文本生成任务的时候遇到了这个问题，显示子进程出错

NoraNotDora

训练时没有响应

所用的脚本： ```` #! /bin/bash export CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS_PER_NODE=4 NNODES=1 MASTER_ADDR="localhost" MASTER_PORT=12346 OPTS="" OPTS+=" --use-delta" OPTS+=" --model-config /home/zyz/.cache/modelscope/hub/OpenBMB/cpm-bee-10b/config/cpm-bee-10b.json" OPTS+=" --dataset /home/zyz/cpm/datasets/dataset.json" OPTS+=" --eval_dataset /home/zyz/cpm/datasets/step/data" OPTS+=" --epoch 100" OPTS+=" --batch-size 5" OPTS+=" --train-iters...

sjsj102323