verifiers
verifiers copied to clipboard
Display running average of metrics during rollout generation + scoring
Would be cool to show a running avg. of metrics during generation+scoring, most basic ones i can think of are reward and seq_len in the tqdm bar description, that are updated as rollouts complete
@mikasenghaas I started experimenting with this idea. By seq_len, do you mean the completion length?
probably completion len yea
I'll open a PR soon (in the next few days)
@mikasenghaas is this still relevant? I opened #443 but will require more work after the eval refactoring. LMK and I'll adapt my PR.