edc3000 comments

Results 12 comments of

edc3000

请问chatglm中tokenizer(question)的结果是question+[gMASK]+<sop>，但是如果我自己进行tokenizer，设定为[gMASK]+<sop>+question，两种方式是否都可以。

想问下我要做简单的指令微调（e.g., 帮我提取标题的人名）非对话，使用的是alpaca格式，tokenizer处理是这样吗：instrution +input +[gMASK] + sop +answer

图片

Async pipeline in generate and compute score

@mertunsall I have set `reward_model.launch_reward_fn_async=True` and `reward_model.reward_manager=prime`, but rewards are calculated after all rollouts completed. Even though reward calculation is asynchronous, this causes GPU utilization to be zero for extended...

Why does it take such a long time to perform SFT using LoRA?

大佬后面解决了吗

请教一下，GRPO训练如何并行使用奖励模型或者生成式奖励模型？是否有简单的示例

我还有一个问题，计算reward的时候好像是串行的，这时候显卡利用率为0，而等genRM打分要很久，有没有办法或者示例是异步进行的，例如rollout一条genRM同时打分一条，不需要等到rollout完再打分

请教一下，GRPO训练如何并行使用奖励模型或者生成式奖励模型？是否有简单的示例

@Leon-LihongWang Hi, do I just need to follow what you said without changing other files like recipe/one_step_off_policy?

请教一下，GRPO训练如何并行使用奖励模型或者生成式奖励模型？是否有简单的示例

@Leon-LihongWang Hello, this is a very good solution. I can calculate genRM rewards asynchronously as you said, but there is still an efficiency bottleneck because this method is still synchronous....

DPO显存分布不均匀

same problem，大佬们怎么解决的，我的版本是0.9.4.dev0

[recipe] feat: asynchronous reward agent with mini-batch pipeline and one-step off-policy training

@haolinyan Good job! But I have an error after running your recipe. Error is "omegaconf.errors.ConfigAttributeError: Key 'ray_init' is not in struct" on recipe/async_reward_agent/main_ppo.py 226 `num_cpus=config.ray_kwargs.ray_init.num_cpus`. Can you help me?

[recipe] feat: asynchronous reward agent with mini-batch pipeline and one-step off-policy training

> > @haolinyan Good job! But I have an error after running your recipe. Error is "omegaconf.errors.ConfigAttributeError: Key 'ray_init' is not in struct" on recipe/async_reward_agent/main_ppo.py 226 `num_cpus=config.ray_kwargs.ray_init.num_cpus`. Can you help...