open-instruct New code utils

Mar 27 '25 18:03 vwxyzjn

@vwxyzjn do we still need this PR? Not sure if its additional stuff to the code tool or not.

May 13 '25 17:05 hamishivi

Ok, now that we merged in the PR which added async-by-default verifiers and configureable verifiers we're able to (finally) cleanly merge in the code verifier. There's code in setup_ray_node.sh which spins up a load-balanced code-exeuction server locally on the training machine and then writes the endpoint to an env variable which is read by the training script. This gives us super reliable code-execution during training which won't falter regardless of how many training jobs we're doing in parallel.

Side note: there was support for some weirdness with passing a list of dataset for a single training instance. I did a quick scan of our datasets and didn't see any place where that was the case so I gutted that out to simplify the code. Lemme know if that messes stuff up. It's in apply_verifiable_rewards in model_utils.py ^^^ EDIT: I was being dumb, I didn't realize it was for applying multiple verifiers. I restored that. Just needed a rework in how i was storing the code data, NBD

Jun 04 '25 22:06 saurabh111233212

test run here

test wandb here

reference wandb from previous (stable) code infra here

Jun 04 '25 23:06 saurabh111233212