OLMo icon indicating copy to clipboard operation
OLMo copied to clipboard

Set fs_local_rank as global_rank when FS_LOCAL_RANK is not available

Open hxdtest opened this issue 1 year ago • 1 comments

In scritp scripts/run_with_environment.shFS_LOCAL_RANK is set as RANK.

export RANK=$SLURM_PROCID
export FS_LOCAL_RANK=$SLURM_PROCID

If the job is not launched by scripts/run_with_environment.sh and all ranks share the same filesystem, every local rank0 writes global_indices.npy.

hxdtest avatar Feb 18 '24 08:02 hxdtest

I don't think this is the right approach to this problem. Our code (arbitrarily) assumes that unless FS_LOCAL_RANK is set, each node has a separate file system. I don't think assuming that each node has the same file system is a better assumption to have. The best behavior might be to raise an error and tell the user to explicitly set FS_LOCAL_RANK, so that no assumption is made.

2015aroras avatar Mar 11 '24 22:03 2015aroras