MedicalGPT issues

Results 45 MedicalGPT issues

Sort by recently updated

加载bloom 13B模型报错

ValueError: The state dictionary of the model you are trying to load is corrupted. Are you sure it was properly saved?

1615070057

bug

使用merge_peft_adapter.py进行merge的时候，词表映射出现了问题

### Describe the Question 使用医疗数据二次预训练之后，使用merge_peft_adapter.py将训练好的模型与llama-7b进行mearge，出现了下面的问题。 ### Describe your attempts Traceback (most recent call last): File "/root/nas/llm-prompt/MedicalGPT-main/scripts/merge_peft_adapter.py", line 102, in main() File "/root/nas/llm-prompt/MedicalGPT-main/scripts/merge_peft_adapter.py", line 79, in main tokenizer = tokenizer_class.from_pretrained(peft_model_path, trust_remote_code=True)...

nlper-hou

question

RLHH

### Describe the Question Please provide a clear and concise description of what the question is. ### Describe your attempts - [ ] I walked through the tutorials - [...

yangliuIOC

question

单机多卡运行卡死

/### Describe the Question Please provide a clear and concise description of what the question is. 直接运行run_pt.sh后，模型正常加载，到数据那一步卡死了，也不报错，也不往下走，显卡显存也卡住不动 ### Describe your attempts - [ ] I walked through the tutorials -...

zhangxinxin0428

question

使用run_pt.sh对llama-13B在医疗数据上增量训练，跑了5个epoch，可是loss不下降是怎么回事？一直是10.25附近波动

### Describe the Question Please provide a clear and concise description of what the question is. ### Describe your attempts - [ ] I walked through the tutorials - [...

nlper-hou

question

单机多卡预训练ChatGLM报错：

### Describe the Question Please provide a clear and concise description of what the question is. 单卡训练可以，单机多卡不形训练命令为： CUDA_VISIBLE_DEVICES=0,1 torchrun --nnodes 1 --nproc_per_node 1 pretraining.py \ --model_type chatglm \ --model_name_or_path...

zzzhaoguziji

question

跑增量预训练是中断后恢复不能继续

出错后恢复重跑报错，已经去掉--overwrite_output_dir参数，麻烦请问可能是怎么原因呀，由于跑的时间要比较长，一旦中断现在就要从头开始 raise ValueError(f"Can't find a valid checkpoint at {resume_from_checkpoint}") ValueError: Can't find a valid checkpoint-8000 checkpoint-8000目录下文件存在 adapter_config.json adapter_model.bin optimizer.pt rng_state_0.pth rng_state_1.pth rng_state_2.pth rng_state_3.pth scaler.pt scheduler.pt trainer_state.json training_args.bin

charryshi

question

ValueError: 130004 is not in list

### Describe the Question 采用chatglm-6b-v0模型进行全量参数预训练时，--use_peft设为False，启动命令如下： CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nproc_per_node 8 pretraining.py \ --model_type chatglm \ --model_name_or_path /home/vca/lsg/ChatGPT/open-models/chatglm-6b-v0 \ --train_file_dir ../data/pretrain \ --validation_file_dir ../data/pretrain \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --do_train...

sexan

question

直接运行run_rm.sh，产生关于计算图的RuntimeError

### Describe the Question 报错信息如下： RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter...

zhpmatrix

question

如何测试reward model

### Describe the Question 如何测试训练好的reward model

lianzhaoy

question

MedicalGPT
MedicalGPT copied to clipboard

Metadata

加载bloom 13B模型报错

使用merge_peft_adapter.py进行merge的时候，词表映射出现了问题

RLHH

单机多卡运行卡死

使用run_pt.sh对llama-13B在医疗数据上增量训练，跑了5个epoch，可是loss不下降是怎么回事？一直是10.25附近波动

单机多卡预训练ChatGLM报错：

跑增量预训练是中断后恢复不能继续

ValueError: 130004 is not in list

直接运行run_rm.sh，产生关于计算图的RuntimeError

如何测试reward model

← Metadata

Owner

Metadata

MedicalGPT MedicalGPT copied to clipboard

Metadata

← Metadata

Owner

Metadata

MedicalGPT
MedicalGPT copied to clipboard