OLMo
OLMo copied to clipboard
Create an eval-only script for existing ckpts
This PR adds scripts/eval.py, which evaluates one or more existing ckpts while bypassing the training steps.
It seems impossible to backfill evals back to the original wandb run, because "step" must always increase. Rewinding the run will truncate the log, which we don't want. Therefore, this script logs things to a new wandb run.
Starting from a training setup:
- You can keep using the same yaml file.
- Make a copy of the
XXX.shfile intoXXX-eval.sh, point toscripts/eval.sh, add a flag--wandb.group=XXXto ensure it logs to the same group, and specify--load_pathto be either a single ckpt or all ckpts under a directory. - Make a copy of the
XXX-launch.shfile intoXXX-eval-launch.sh, change--task-nametoXXX-eval, and change the command so it runsXXX-eval.sh.
See an example in peteish1-eval.sh and peteish1-eval-launch.sh.
Let me know when this is ready for another review?