OLMo icon indicating copy to clipboard operation
OLMo copied to clipboard

OlmoConfigurationError

Open DouPiChen opened this issue 1 year ago • 1 comments

❓ The question

When I run torchrun --nproc_per_node=8 scripts/train.py configs/official/OLMo-1B.yaml, I got OlmoConfigurationError: FileNotFoundError raised while resolving interpolation: no_exist/checkpoints, /results full_key: save_folder object_type=TrainConfig

DouPiChen avatar Feb 06 '24 10:02 DouPiChen

Hi @DouPiChen, the issue that you're facing is that the destination the training run is trying to save to does not exist.

In configs/official/OLMo-1B.yaml, the save_folder field determines where the training checkpoints are saved. The ${oc.env:SCRATCH_DIR,no_exist} resolves to the value of the environment variable SCRATCH_DIR, or the value no_exist if the environment variable is not set. The environment variable does not appear to be set, so ${path.choose:${oc.env:SCRATCH_DIR,no_exist}/checkpoints,/results} in the save_folder field becomes ${path.choose:no_exist/checkpoints,/results}, which chooses the first path from the list that exists. Since neither no_exist/checkpoints nor /results are valid paths in your system, you get the OlmoConfigurationError.

To fix the problem, you can either set SCRATCH_DIR so that $SCRATCH_DIR/checkpoints exists or you can change the save_folder to point elsewhere.

2015aroras avatar Feb 06 '24 18:02 2015aroras

Thank you!

DouPiChen avatar Feb 07 '24 01:02 DouPiChen