Weber Xie comments

Results 8 comments of


                                            Weber Xie

Will this project will support the Pytorch?

Any news about this question? Looking forward to the support of PyTorch.

Build error (error: expected primary-expression before 'some' token)

Thanks @da03 ! My env: - Centos 7 - CUDA 9.1 - PyTorch 1.2

[Bug]: win11只会出现图标不会出现程序页面

同样问题，期待解决

Performance on GPUs and multiple GPU support

Met the same problem, anyone on this team can reply this issue?

Add Model FLOPS Utilization and Hardware FLOPS utilization % to the logs.

Also looking forward to this feature.

Why is the Reward Model not updated of the ppo_trainer ?

Thanks for your reply! So the Reward Model will not be updated in the PPO train loop, Is this the standard process of the PPO algorithm? Thanks.

Why is the Reward Model not updated of the ppo_trainer ?

Thanks for your kind explanation! I understand Reward Model is static. Regarding the code implementation of TRLX's ppo_trainer, the policy function and value function are the same model, am I...

Why is the Reward Model not updated of the ppo_trainer ?

From the paper > Learning to summarize from human feedback , it mentions > We initialize the value function to the parameters of > the reward model. In our experiments,...