LSC527
LSC527
According to [Low-Precision Reinforcement Learning](https://arxiv.org/abs/2102.13565), directly apply mixed precision tranining in RL will reduce the performance. Have you tried using mixed precision tranining in RL? Is there a performance drop?
判断点击货物后出现的绿光,不须识别建筑
你可以把模拟器改成1920*1080,然后通过抓抓获取9个建筑,3个火车,以及各个按钮的坐标,在代码中相应坐标位置进行修改就可以了。抓抓的使用说明在README的“注意事项”里面
感谢提出,登陆界面好像每个人都会有些许不同,所以跟我这边不一样的话只能自己手动修改了
`bash scripts/download_first_stages.sh` Download ckpt first as described in README.
> > 1. It seems that prompts are still passing to vllm engines in micro rollout batches during make_experience. > > However, passing all prompts to vllm engines all at...
> @LSC527 如果开启vllm的话,因为训练和推理分离,所以actor model和critic model的推理和训练计算量是相当的,建议把两者的GPU数量调整成一致。 @wuxibin89 因为我actor模型是70b llama2,critic模型是13b llama2小很多,所以critic的GPU数量设置的少。 > > 通过profile发现是由于actor模型计算action_log_probs的推理开始时出现了长达80秒的all_gather通信。 > > 这个all_gather通信的开销来自于actor和vllm参数同步,在训练阶段结束后,需要通过一次all_gather把参数收集到actor model的rank 0,然后broadcast给vllm的所有rank https://github.com/OpenLLMAI/OpenRLHF/blob/main/openrlhf/trainer/ray/ppo_actor.py#L142-L145 `_broadcast_to_vllm`的开销为什么会在`actor_time`上体现呢?https://github.com/OpenLLMAI/OpenRLHF/blob/main/openrlhf/trainer/ppo_utils/experience_maker.py#L257-L259
@wuxibin89 每一个step,actor_time耗时都很长。并且我直接去掉_broadcast_to_vllm后仍然是这样。目前观察到这个现象会出现在actor_num_nodes>1 + zero3的场景下。https://github.com/OpenLLMAI/OpenRLHF/blob/3c918755faa31ee810f3624a82ba5f7879e4f8d3/openrlhf/trainer/ray/ppo_actor.py#L116 _broadcast_to_vllm后面有个torch.distributed.barrier(),所以耗时应该不会计算到actor_time里。actor_time看起来就是单纯的actor model zero3 forward耗时。 我再继续排查一下。
@wuxibin89 最终在ray.get(llm.generate.remote())前后加了barrier,发现是这一行代码运行带来的额外耗时。如果没有加barrier,额外耗时会被记入actor_time中。 https://github.com/OpenLLMAI/OpenRLHF/blob/3c918755faa31ee810f3624a82ba5f7879e4f8d3/openrlhf/trainer/ppo_utils/experience_maker.py#L344
> > Hi, have you solved this problem? > > I'm so sorry. I have abandoned DeepSpeed-Chat for RLHF unless they solve this issue. inferece_tp_size > 1 is a must...