[Feat]: Search Tool Invocation in Multi-Turn RL Training

Checklist Before

[x] Search for similar PR(s).

What does this PR do?

As veRL users, we want the model to invoke designated tools during the Actor rollout phase and seamlessly integrate their outputs into the training pipeline.
We have added search-tool invocation capability to veRL-sglang MultiTurnRL, enabling the model to issue retrieval requests during Actor rollout and directly leverage the returned results for training.
providing the community with a reimplementation similar to searchR1.
Training curves on Wandb: search_async_rl
[x] Third-party training reproducibility verification has been successfully completed.
Thanks to the SGlang team and the author of searchR1 for their efficient support!

Project Member:

Ling Chang (Author)
Bowen Jin (Advisor on Training)
Xiaocheng Wang (Advisor on Implementation)
Nan Jiang (Reproduce)
Chenyang Zhao (PM)
Xiang Long (Reviewer, PM)

Checklist Before Submitting

[x] Read the Contribute Guide.
[x] Apply pre-commit checks.
[x] Add [BREAKING] to the PR title if it breaks any API.
[x] Update the documentation about your changes in the docs.
[x] Add CI test(s) if necessary.

How to Use

Refer to verl-multiturn-searchR1-like.md or verl-multiturn-searchR1-like_ZH.md in the Awesome-ML-SYS-Tutorial repository.

May 25 '25 10:05 Lins-01

All committers have signed the CLA.

May 25 '25 10:05 CLAassistant

https://wandb.ai/lingchang-ustc/search_async_rl/workspace?nw=nwuserlingchang

@eric-haibin-lin Here is the training curve

May 27 '25 00:05 zhaochenyang20

LGTM

May 28 '25 12:05 SwordFaith

great job

May 28 '25 17:05 zhaochenyang20

@Lins-01 Thanks for your contributions for veRL! I noticed some of the code appears to be referenced from the following projects:

fufankeji/fufan-chat-api encoder.py RUC-NLPIR/FlashRAG utils.py

Could you please to check the licenses of the referenced code to avoid any potential legal issues?

Some code may have been duplicated with an existing PR.

https://github.com/volcengine/verl/pull/1525/files#

Also, a unit test for the search tooling functionality is welcome.

May 29 '25 04:05 feifeibear

@Lins-01 Thanks for your contributions for veRL! I noticed some of the code appears to be referenced from the following projects:

fufankeji/fufan-chat-api encoder.py RUC-NLPIR/FlashRAG utils.py

Could you please to check the licenses of the referenced code to avoid any potential legal issues?

Some code may have been duplicated with an existing PR.

https://github.com/volcengine/verl/pull/1525/files#

Also, a unit test for the search tooling functionality is welcome.

Appreciate the reminder and encouragement! License attributions have been added — will follow up with the unit test soon.

May 29 '25 10:05 Lins-01

LGTM. But I strongly suggest the author add necessary unit tests for the search tool using.

will add it with mock search api.

May 29 '25 17:05 zhaochenyang20

@feifeibear

LGTM. But I strongly suggest the author add necessary unit tests for the search tool using.

We've added the unit tests for the search tool as recommended.

May 30 '25 07:05 Lins-01

great!

May 31 '25 05:05 zhaochenyang20

Using the merged patch in this PR, I reran training on the original Search-R1 Wikipedia corpus (GRPO schedule, no additional data) and evaluated the resulting model.

Dataset	Search-R1 paper (Qwen2.5-3B)	This run
NQ	0.397	0.406
TriviaQA	0.565	0.582
PopQA	0.391	0.420
HotpotQA	0.331	0.338
2Wiki	0.310	0.332
Musique	0.124	0.111
Bamboogle	0.232	0.296

💾 Weights & full inference script are available on the Hub:
https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn

Everything matches the expected behaviour—tool calls, multi-turn rollout and scores. Thanks again for the thorough work! @Lins-01

Jun 03 '25 05:06 SeungyounShin

Using the merged patch in this PR, I reran training on the original Search-R1 Wikipedia corpus (GRPO schedule, no additional data) and evaluated the resulting model.

Dataset Search-R1 paper (Qwen2.5-3B) This run NQ 0.397 0.406 TriviaQA 0.565 0.582 PopQA 0.391 0.420 HotpotQA 0.331 0.338 2Wiki 0.310 0.332 Musique 0.124 0.111 Bamboogle 0.232 0.296 💾 Weights & full inference script are available on the Hub: https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn

Everything matches the expected behaviour—tool calls, multi-turn rollout and scores. Thanks again for the thorough work! @Lins-01

Wow, thank you for the kind words! Really appreciate your recognition—it’s truly encouraging for our team. If possible, could you share the training hyperparameters you used? I believe it would be helpful for the community (mine were slightly lower—haha).@SeungyounShin

Jun 04 '25 15:06 Lins-01

[sglang] Feat: Search Tool Invocation in Multi-Turn RL Training

[Feat]: Search Tool Invocation in Multi-Turn RL Training

Checklist Before

What does this PR do?

Checklist Before Submitting

How to Use