MedicalGPT icon indicating copy to clipboard operation
MedicalGPT copied to clipboard

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。

Results 45 MedicalGPT issues
Sort by recently updated
recently updated
newest added

ValueError: The state dictionary of the model you are trying to load is corrupted. Are you sure it was properly saved?

bug

### Describe the Question 使用医疗数据二次预训练之后,使用merge_peft_adapter.py将训练好的模型与llama-7b进行mearge,出现了下面的问题。 ### Describe your attempts Traceback (most recent call last): File "/root/nas/llm-prompt/MedicalGPT-main/scripts/merge_peft_adapter.py", line 102, in main() File "/root/nas/llm-prompt/MedicalGPT-main/scripts/merge_peft_adapter.py", line 79, in main tokenizer = tokenizer_class.from_pretrained(peft_model_path, trust_remote_code=True)...

question

### Describe the Question Please provide a clear and concise description of what the question is. ### Describe your attempts - [ ] I walked through the tutorials - [...

question

/### Describe the Question Please provide a clear and concise description of what the question is. 直接运行run_pt.sh后,模型正常加载,到数据那一步卡死了,也不报错,也不往下走,显卡显存也卡住不动 ### Describe your attempts - [ ] I walked through the tutorials -...

question

### Describe the Question Please provide a clear and concise description of what the question is. ### Describe your attempts - [ ] I walked through the tutorials - [...

question

### Describe the Question Please provide a clear and concise description of what the question is. 单卡训练可以,单机多卡不形 训练命令为: CUDA_VISIBLE_DEVICES=0,1 torchrun --nnodes 1 --nproc_per_node 1 pretraining.py \ --model_type chatglm \ --model_name_or_path...

question

出错后恢复重跑报错,已经去掉--overwrite_output_dir参数,麻烦请问可能是怎么原因呀,由于跑的时间要比较长,一旦中断现在就要从头开始 raise ValueError(f"Can't find a valid checkpoint at {resume_from_checkpoint}") ValueError: Can't find a valid checkpoint-8000 checkpoint-8000目录下文件存在 adapter_config.json adapter_model.bin optimizer.pt rng_state_0.pth rng_state_1.pth rng_state_2.pth rng_state_3.pth scaler.pt scheduler.pt trainer_state.json training_args.bin

question

### Describe the Question 采用chatglm-6b-v0模型进行全量参数预训练时,--use_peft设为False,启动命令如下: CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nproc_per_node 8 pretraining.py \ --model_type chatglm \ --model_name_or_path /home/vca/lsg/ChatGPT/open-models/chatglm-6b-v0 \ --train_file_dir ../data/pretrain \ --validation_file_dir ../data/pretrain \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --do_train...

question

### Describe the Question 报错信息如下: RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter...

question

### Describe the Question 如何测试训练好的reward model

question