Haolin Yan (闫 浩霖)
Results
2
issues of
Haolin Yan (闫 浩霖)
[recipe] feat: asynchronous reward agent with mini-batch pipeline and one-step off-policy training
10
### What does this PR do? This PR introduces the **asynchronous reward agent** to schedule and mitigate communication bottlenecks in RL training scenarios that rely on remote reward services (e.g.,...
First of all, I'd like to express my sincere gratitude to all the contributors of this repository! I'm able to run the `allreduce_benchmark` smoothly, but unfortunately, its performance is significantly...