LSC527 comments

Results 12 comments of


                                            LSC527

[Feature Request] Half Precision support

According to [Low-Precision Reinforcement Learning](https://arxiv.org/abs/2102.13565), directly apply mixed precision tranining in RL will reduce the performance. Have you tried using mixed precision tranining in RL? Is there a performance drop？

请教一下，建筑的字库是怎么取的？这么少的点就能区分出不同建筑

判断点击货物后出现的绿光，不须识别建筑

能否改成1080p的？

你可以把模拟器改成1920*1080，然后通过抓抓获取9个建筑，3个火车，以及各个按钮的坐标，在代码中相应坐标位置进行修改就可以了。抓抓的使用说明在README的“注意事项”里面

v1版本中似乎没有点击授权的操作

感谢提出，登陆界面好像每个人都会有些许不同，所以跟我这边不一样的话只能自己手动修改了

No such file or directory: 'configs/first_stage_models/vq-f4/model.yaml'

`bash scripts/download_first_stages.sh` Download ckpt first as described in README.

About using vLLM for generation

> > 1. It seems that prompts are still passing to vllm engines in micro rollout batches during make_experience. > > However, passing all prompts to vllm engines all at...

Unexpected long actor_time when train_ppo_ray

> @LSC527 如果开启vllm的话，因为训练和推理分离，所以actor model和critic model的推理和训练计算量是相当的，建议把两者的GPU数量调整成一致。 @wuxibin89 因为我actor模型是70b llama2，critic模型是13b llama2小很多，所以critic的GPU数量设置的少。 > > 通过profile发现是由于actor模型计算action_log_probs的推理开始时出现了长达80秒的all_gather通信。 > > 这个all_gather通信的开销来自于actor和vllm参数同步，在训练阶段结束后，需要通过一次all_gather把参数收集到actor model的rank 0，然后broadcast给vllm的所有rank https://github.com/OpenLLMAI/OpenRLHF/blob/main/openrlhf/trainer/ray/ppo_actor.py#L142-L145 `_broadcast_to_vllm`的开销为什么会在`actor_time`上体现呢？https://github.com/OpenLLMAI/OpenRLHF/blob/main/openrlhf/trainer/ppo_utils/experience_maker.py#L257-L259

Unexpected long actor_time when train_ppo_ray

@wuxibin89 每一个step，actor_time耗时都很长。并且我直接去掉_broadcast_to_vllm后仍然是这样。目前观察到这个现象会出现在actor_num_nodes>1 + zero3的场景下。https://github.com/OpenLLMAI/OpenRLHF/blob/3c918755faa31ee810f3624a82ba5f7879e4f8d3/openrlhf/trainer/ray/ppo_actor.py#L116 _broadcast_to_vllm后面有个torch.distributed.barrier()，所以耗时应该不会计算到actor_time里。actor_time看起来就是单纯的actor model zero3 forward耗时。我再继续排查一下。

Unexpected long actor_time when train_ppo_ray

@wuxibin89 最终在ray.get(llm.generate.remote())前后加了barrier，发现是这一行代码运行带来的额外耗时。如果没有加barrier，额外耗时会被记入actor_time中。 https://github.com/OpenLLMAI/OpenRLHF/blob/3c918755faa31ee810f3624a82ba5f7879e4f8d3/openrlhf/trainer/ppo_utils/experience_maker.py#L344

[Bug] In step3, a runtime error will be thrown when inference_tp_size>1

> > Hi, have you solved this problem? > > I'm so sorry. I have abandoned DeepSpeed-Chat for RLHF unless they solve this issue. inferece_tp_size > 1 is a must...