verl icon indicating copy to clipboard operation
verl copied to clipboard

Support for mutliturn online RL training

Open UbeCc opened this issue 9 months ago • 15 comments

Currently, verl only support single-turn rl training. As agents turning is becoming urgent, will verl support multiturn rl in the next few days? Maybe I can help. Thanks!

@PeterSH6 @zhaochenyang20

UbeCc avatar Feb 25 '25 16:02 UbeCc

Sure. Welcome to join. And, proposal should be professional, look at how I make proposal to SGLang:

Reaserch Project 开题

  1. 问题是什么,如何清晰定义问题;
  2. 这个问题的 scope 如何,有什么非常强的假设,这些假设合理么;
  3. 谁在乎这个问题,不要说谷歌会在乎这个问题,要精确到谷歌的某个组甚至是某个人会在乎这个问题;
  4. 现有的方案是什么,有什么不足;
  5. 我们的方案可能是什么(开题的时候不一定能写完);
  6. 如何评估?达到了什么效果则说明我们的方法奏效了;
  7. 这个过程有什么不确定性,风险如何;
  8. 计划工作的 timeline 是如何;

Feature 开题

  1. 目标框架现在的实现是什么样子的;有什么问题;
  2. 修改方案是什么,预计会修改什么部分;
  3. 将会达到的效果如何;
  4. 有什么不确定性;
  5. 计划工作的 timeline;

zhaochenyang20 avatar Feb 25 '25 16:02 zhaochenyang20

@UbeCc Nice suggestion! We can discuss the plan this week. Could you connect with us through WeChat or Slack?

PeterSH6 avatar Feb 25 '25 16:02 PeterSH6

@UbeCc @PeterSH6 I can connect you guys. Haoran is my senior. And, good night 😂

zhaochenyang20 avatar Feb 25 '25 16:02 zhaochenyang20

@UbeCc @PeterSH6 I can connect you guys. Haoran is my senior. And, good night 😂

Thanks Chenyang, enjoy your day!

UbeCc avatar Feb 25 '25 17:02 UbeCc

@UbeCc Nice suggestion! We can discuss the plan this week. Could you connect with us through WeChat or Slack?

Yeah, let me send my WeChat id through email

UbeCc avatar Feb 25 '25 17:02 UbeCc

Great idea! I could also offer some help!

YSLIU627 avatar Feb 25 '25 18:02 YSLIU627

I'm also working on multiturn online RL training at the moment, and I'd be glad to assist if you need any help. Maybe we can create a WeChat group and then add everyone to the group for discussion.

AIBionics avatar Feb 26 '25 00:02 AIBionics

Yeah could you plz send your WeChat id to me through email?

Let me create a group and work together.

[email protected]

UbeCc avatar Feb 26 '25 00:02 UbeCc

Great! My wechat is liuzhihan0627 . See you then, Best, Zhihan

On Tue, Feb 25, 2025 at 6:45 PM Haoran Wang @.***> wrote:

Yeah could you plz send your WeChat id to me through email?

Let me create a group and work together.

@.***

— Reply to this email directly, view it on GitHub https://github.com/volcengine/verl/issues/385#issuecomment-2683611993, or unsubscribe https://github.com/notifications/unsubscribe-auth/APK6TFYXJJBZODWCCJZJ3RL2RUFB7AVCNFSM6AAAAABX3D4UF6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBTGYYTCOJZGM . You are receiving this because you commented.Message ID: @.***> [image: UbeCc]UbeCc left a comment (volcengine/verl#385) https://github.com/volcengine/verl/issues/385#issuecomment-2683611993

Yeah could you plz send your WeChat id to me through email?

Let me create a group and work together.

@.***

— Reply to this email directly, view it on GitHub https://github.com/volcengine/verl/issues/385#issuecomment-2683611993, or unsubscribe https://github.com/notifications/unsubscribe-auth/APK6TFYXJJBZODWCCJZJ3RL2RUFB7AVCNFSM6AAAAABX3D4UF6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBTGYYTCOJZGM . You are receiving this because you commented.Message ID: @.***>

YSLIU627 avatar Feb 26 '25 03:02 YSLIU627

Yeah could you plz send your WeChat id to me through email?

Let me create a group and work together.

[email protected]

I've already wrote a multi-turn implementation and it works, but the training is not stable. Could you add me to the wechat group? My wechat id is Zukala-Koth. Great thanks! @UbeCc

sbl1996 avatar Feb 27 '25 04:02 sbl1996

@sbl1996 Sure. I will tell him tommorow.

zhaochenyang20 avatar Feb 27 '25 09:02 zhaochenyang20

Yeah could you plz send your WeChat id to me through email? Let me create a group and work together. [email protected]

I've already wrote a multi-turn implementation and it works, but the training is not stable. Could you add me to the wechat group? My wechat id is Zukala-Koth. Great thanks! @UbeCc

Done. Thank you!

UbeCc avatar Feb 27 '25 09:02 UbeCc

Hi, I’m interested in multiturn RL as well. Could you please add me to the group? My WeChat ID is sfoliver. Thanks a lot! @UbeCc

oliverz20 avatar Feb 28 '25 12:02 oliverz20

Hi @UbeCc , I'm also really into multiturn RL and would love to join the group! My WeChat ID is Liu_Qihuang . Looking forward to connecting and learning more. Thanks!

Tshiyao avatar Mar 04 '25 01:03 Tshiyao

Got it. We are already working currently. Thanks for your support

UbeCc avatar Mar 04 '25 01:03 UbeCc

Hi @UbeCc , I'm also interested into multiturn RL and would love to join the group! My WeChat ID is innerpeace . Looking forward to connecting and learning more. Thanks a lot!

Jackory avatar Mar 07 '25 06:03 Jackory

@PeterSH6 @UbeCc @zhaochenyang20 I'm interested in multi-turn RL as well. We have a real-world use case and I was going to start my own implementation before seeing this thread. Would love to contribute or discuss technical design, whichever is preferable!

hongyi-zhang avatar Mar 12 '25 19:03 hongyi-zhang

@UbeCc @PeterSH6 @zhaochenyang20 Interested in contribution! I have a related multi-turn RL implementation but it's not that efficient. My wechat is Tianzhe011127.

LeslieTrue avatar Mar 13 '25 06:03 LeslieTrue

@UbeCc @PeterSH6 @zhaochenyang20 I am working on multi-step RL training for agents and would like to join the wechat group! My wechat id is weiquan0128. Looking forward to connecting and learning more. Thanks!

quanwei0 avatar Mar 18 '25 05:03 quanwei0

Thank you for your attention! We already have a large group of people working on the feature. Keep syncing if we have any progress!

UbeCc avatar Mar 18 '25 05:03 UbeCc

same for me. i am also working on multi turn rl . my wechat is x34ren. could you please add me to the group?

XuanRen4470 avatar Mar 21 '25 09:03 XuanRen4470

@UbeCc @PeterSH6 @zhaochenyang20 I’m excited about multi-turn RL and would be glad to join the group. My WeChat is alex-kovrigin — happy to connect and dive deeper into the topic. Thanks!

waleko avatar Apr 05 '25 09:04 waleko

demo: https://github.com/volcengine/verl/pull/917

eric-haibin-lin avatar Apr 06 '25 19:04 eric-haibin-lin

Thank you @eric-haibin-lin! I am curious whether https://github.com/volcengine/verl/pull/917 is ready for use?

DachengLi1 avatar Apr 06 '25 21:04 DachengLi1

Thank you @eric-haibin-lin! I am curious whether #917 is ready for use?

indeed please ask us 😂 We get good codes ready. But the validation score still does not get improved. In our close-sourced sandbox, it works. But for open-sourced sandbox, it doesn't work right now. We will open-source and merge it anyway in early next week.

zhaochenyang20 avatar Apr 06 '25 21:04 zhaochenyang20

Initial support is available. Please open new issues for specific follow up issues https://verl.readthedocs.io/en/latest/sglang_multiturn/multiturn.html

eric-haibin-lin avatar Jun 18 '25 22:06 eric-haibin-lin