edc3000

Results 12 comments of edc3000

想问下我要做简单的指令微调(e.g., 帮我提取标题的人名)非对话,使用的是alpaca格式,tokenizer处理是这样吗:instrution +input +[gMASK] + sop +answer

@mertunsall I have set `reward_model.launch_reward_fn_async=True` and `reward_model.reward_manager=prime`, but rewards are calculated after all rollouts completed. Even though reward calculation is asynchronous, this causes GPU utilization to be zero for extended...

我还有一个问题,计算reward的时候好像是串行的,这时候显卡利用率为0,而等genRM打分要很久,有没有办法或者示例是异步进行的,例如rollout一条genRM同时打分一条,不需要等到rollout完再打分

@Leon-LihongWang Hi, do I just need to follow what you said without changing other files like recipe/one_step_off_policy?

@Leon-LihongWang Hello, this is a very good solution. I can calculate genRM rewards asynchronously as you said, but there is still an efficiency bottleneck because this method is still synchronous....

same problem,大佬们怎么解决的,我的版本是0.9.4.dev0

@haolinyan Good job! But I have an error after running your recipe. Error is "omegaconf.errors.ConfigAttributeError: Key 'ray_init' is not in struct" on recipe/async_reward_agent/main_ppo.py 226 `num_cpus=config.ray_kwargs.ray_init.num_cpus`. Can you help me?

> > @haolinyan Good job! But I have an error after running your recipe. Error is "omegaconf.errors.ConfigAttributeError: Key 'ray_init' is not in struct" on recipe/async_reward_agent/main_ppo.py 226 `num_cpus=config.ray_kwargs.ray_init.num_cpus`. Can you help...