Create an eval-only script for existing ckpts

Open liujch1998 opened this issue 1 year ago • 1 comments

This PR adds scripts/eval.py, which evaluates one or more existing ckpts while bypassing the training steps.

It seems impossible to backfill evals back to the original wandb run, because "step" must always increase. Rewinding the run will truncate the log, which we don't want. Therefore, this script logs things to a new wandb run.

Starting from a training setup:

You can keep using the same yaml file.
Make a copy of the XXX.sh file into XXX-eval.sh, point to scripts/eval.sh, add a flag --wandb.group=XXX to ensure it logs to the same group, and specify --load_path to be either a single ckpt or all ckpts under a directory.
Make a copy of the XXX-launch.sh file into XXX-eval-launch.sh, change --task-name to XXX-eval, and change the command so it runs XXX-eval.sh.

See an example in peteish1-eval.sh and peteish1-eval-launch.sh.

Oct 20 '24 23:10 liujch1998

Let me know when this is ready for another review?

Nov 01 '24 23:11 dirkgr