JDY

Results 5 comments of JDY

Hi, have you solved the issue? I have met the exact same problem.

> Hi, have you solved the issue? I have met the exact same problem. py3.10 llamafactory 0.7.2.dev0 torch 2.3.0 transformers 4.37.2 / 4.41 deepspeed 0.13.0/0.14.0 cuda 12.1 tried: 1. --bf16...

> > Hi, have you solved the issue? I have met the exact same problem. > > 我也刚在尝试,发现只要用_prepare_deepspeed载入ref_model情况下,就会报错,开始zero2/zero3都尝试过 : ( 我去翻了翻源码,好像是trl的问题,它用ref_model初始化deepspeed的时候,会出问题。而且我的理解是ref_model应该会freeze不做训练,他直接ref_model=model,也没有copy,然后直接用refmodel初始化deepspeed。 不行试一试量化ref模型到4bit或者8bit吧。直接跳过deepspeed

可能就是offload问题,我用stage3 no offoad就可以了。 ``` deepspeed --master_port 25002 --include "localhost:4,5,6,7" src/train.py \ --model_name_or_path ${model_path} \ --stage 'dpo' \ --do_train \ --finetuning_type 'full' \ --dpo_ftx 1.0 \ --ddp_timeout 180000000 \ --deepspeed examples/deepspeed/ds_z3_config.json...

Same problem here. I used the instruction: (from the official dataset) What is the most comprehensive and efficient approach in Python to redact personal identifying information from a text string...