MedicalGPT issues

Results 45 MedicalGPT issues

Sort by recently updated

ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (3,2) and requested shape (1,2)

运行sft程序时出现这个报错，请问如何解决呢

Riapy

bug

lora模型合并

### Describe the Question Please provide a clear and concise description of what the question is. 你好，我使用多个txt文档，lora的方法进行增量预训练，但是合并时出现： return config_cls(**kwargs) TypeError: LoraConfig.__init__() got an unexpected keyword argument 'layer_replication' peft是0.9.0 请问这个问题怎么解决？

sevenandseven

question

扩充词表后能否直接进行SFT呢？

### Describe the Question Please provide a clear and concise description of what the question is. 请问扩充词表后能否直接进行SFT呢？如果可以的话，需要改变哪些参数么？

HaotianLiu123

question

大佬，使用自己数据进行增量预训练时，loss不降反增。

大佬，在增量预训练时，使用的是企业的一些简介和经营范围进行尝试训练（所有数据都是领域数据的文本）差不多使用了10W条数据，但是在训练时，发现loss一直在缓慢的增加，请问您遇到过这种问题吗？启动命令： CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node 1 pretraining.py --model_type chatglm --model_name_or_path ../chatglm3-6b-32k --train_file_dir ./data/pretrain --per_device_train_batch_size 8 --per_device_eval_batch_size 8 --do_train --do_eval --use_peft True --seed 42 --max_train_samples -1 --max_eval_samples -1 --num_train_epochs 3 --learning_rate...

SevenMpp

question

增量预训练效果评估

### Describe the Question Please provide a clear and concise description of what the question is. 请教下，有对比测试过有无增量预训练的效果么？增量预训练的效果是怎么评估的呀？

qibao77

question

llama进行rm训练的时候，出现问题ValueError: weight is on the meta device, we need a `value` to put in on cpu.

Traceback (most recent call last): File "F:\xiazai\MedicalGPT-main\reward_modeling.py", line 653, in main() File "F:\xiazai\MedicalGPT-main\reward_modeling.py", line 447, in main model = get_peft_model(model, peft_config) File "C:\Users\admin\.conda\envs\newrlhf\lib\site-packages\peft\mapping.py", line 136, in get_peft_model return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config,...

cove1011

bug

使用qwen进行pretrain的时候出现了问题：Cannot copy out of meta tensor; no data!

Traceback (most recent call last): File "F:\xiazai\MedicalGPT-main\pretraining.py", line 781, in main() File "F:\xiazai\MedicalGPT-main\pretraining.py", line 722, in main trainer = SavePeftModelTrainer( File "C:\Users\admin\.conda\envs\newrlhf\lib\site-packages\transformers\trainer.py", line 489, in __init__ self._move_model_to_device(model, args.device) File "C:\Users\admin\.conda\envs\newrlhf\lib\site-packages\transformers\trainer.py",...

cove1011

bug

单机多卡sft deepspeed zero3 训练一直卡在训练阶段

![image](https://github.com/shibing624/MedicalGPT/assets/22641510/a6f8ce2f-d7cb-4b31-b016-26ba33a946b3) ![image](https://github.com/shibing624/MedicalGPT/assets/22641510/1dff598a-00e6-41b7-b334-09707b751ca5) 一致卡在0，但是显存利用率是满的，不知道为啥

lainxx

question

请问，pt阶段，基础模型比较大(Yi-67B)，多机多卡用那种训练比较好呢？

### Describe the Question 1.请问，pt阶段，基础模型比较大(Yi-67B)，多机多卡用那种训练比较好呢？代码是否支持呢 2.是否支持deepspeed 的 zero-1模式呢，怎么改呢，我看只支持zero2和zero3呢 3.长文本训练就设置--group_by_text True，多长算长呢？这种情况下block_size 还起作用吗 4.block_size 的作用是做什么呢？期待大佬的回复！万分感谢！

listwebit

question

pretraining中的group_texts()方法的目的是什么？

如果我想把每个QA变成一个样本，而不是拼接起来再分块，这里要怎么改？

Zagreus-lzy

question

MedicalGPT
MedicalGPT copied to clipboard

Metadata

ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (3,2) and requested shape (1,2)

lora模型合并

扩充词表后能否直接进行SFT呢？

大佬，使用自己数据进行增量预训练时，loss不降反增。

增量预训练效果评估

llama进行rm训练的时候，出现问题ValueError: weight is on the meta device, we need a `value` to put in on cpu.

使用qwen进行pretrain的时候出现了问题：Cannot copy out of meta tensor; no data!

单机多卡sft deepspeed zero3 训练一直卡在训练阶段

请问，pt阶段，基础模型比较大(Yi-67B)，多机多卡用那种训练比较好呢？

pretraining中的group_texts()方法的目的是什么？

← Metadata

Owner

Metadata

MedicalGPT MedicalGPT copied to clipboard

Metadata

← Metadata

Owner

Metadata

MedicalGPT
MedicalGPT copied to clipboard