How to run the multiturn rollout?
I followed https://github.com/volcengine/verl/blob/main/examples/sglang_multiturn/README.md to generate a multi-turn dataset and run Multi-Turn Rollout, but I found that the generated dataset seems to be single turn? Is there any other way to generate a multi-turn dataset and run Multi-Turn Rollout?
the dataset I get:
{"data_source":"openai/gsm8k","prompt":[{"content":"You are a math expert. You are given a question and you need to solve it step by step. Reasoning step by step before any tool call. You should use the calc_gsm8k_reward tool after step by step solving the question, before generate final answer at least once and refine your answer if necessary. Put your final answer in the format of #### <answer>.","role":"system"},{"content":"Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? Let's think step by step and output the final answer after ####.","role":"user"}],"ability":"math","reward_model":{"ground_truth":"72","style":"rule"},"extra_info":{"answer":"Natalia sold 48/2 = <<48/2=24>>24 clips in May.\nNatalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.\n#### 72","index":0,"need_tools_kwargs":true,"question":"Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?","split":"train","tools_kwargs":{"calc_gsm8k_reward":{"create_kwargs":{"ground_truth":"72"}}}}}
{"data_source":"openai/gsm8k","prompt":[{"content":"You are a math expert. You are given a question and you need to solve it step by step. Reasoning step by step before any tool call. You should use the calc_gsm8k_reward tool after step by step solving the question, before generate final answer at least once and refine your answer if necessary. Put your final answer in the format of #### <answer>.","role":"system"},{"content":"Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn? Let's think step by step and output the final answer after ####.","role":"user"}],"ability":"math","reward_model":{"ground_truth":"10","style":"rule"},"extra_info":{"answer":"Weng earns 12/60 = $<<12/60=0.2>>0.2 per minute.\nWorking 50 minutes, she earned 0.2 x 50 = $<<0.2*50=10>>10.\n#### 10","index":1,"need_tools_kwargs":true,"question":"Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?","split":"train","tools_kwargs":{"calc_gsm8k_reward":{"create_kwargs":{"ground_truth":"10"}}}}}
The example dataset in https://verl.readthedocs.io/en/latest/sglang_multiturn/multiturn.html#handling-multi-modal-inputs-in-datasets:
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is 2 + 2?"},
{"role": "assistant", "content": "