verl
verl copied to clipboard
[Question] Does verl support muilti-round conversation RL training?
Does verl support muilti-round conversation RL training? if it does, which format should I set the dataset parquet files?
It's not currently supported. See this issue:
https://github.com/volcengine/verl/issues/398