OlmoConfigurationError
❓ The question
When I run torchrun --nproc_per_node=8 scripts/train.py configs/official/OLMo-1B.yaml, I got
OlmoConfigurationError: FileNotFoundError raised while resolving interpolation: no_exist/checkpoints, /results
full_key: save_folder
object_type=TrainConfig
Hi @DouPiChen, the issue that you're facing is that the destination the training run is trying to save to does not exist.
In configs/official/OLMo-1B.yaml, the save_folder field determines where the training checkpoints are saved. The ${oc.env:SCRATCH_DIR,no_exist} resolves to the value of the environment variable SCRATCH_DIR, or the value no_exist if the environment variable is not set. The environment variable does not appear to be set, so ${path.choose:${oc.env:SCRATCH_DIR,no_exist}/checkpoints,/results} in the save_folder field becomes ${path.choose:no_exist/checkpoints,/results}, which chooses the first path from the list that exists. Since neither no_exist/checkpoints nor /results are valid paths in your system, you get the OlmoConfigurationError.
To fix the problem, you can either set SCRATCH_DIR so that $SCRATCH_DIR/checkpoints exists or you can change the save_folder to point elsewhere.
Thank you!