OLMo
OLMo copied to clipboard
NotImplementedError: file size not implemented for 'https' files
❓ The question
Any one saw this error before?
I was running "torchrun --nproc_per_node=8 scripts/train.py configs/official/OLMo-1B.yaml --wandb=null --save_overwrite" for a brand new training and I updated all the r2 path by https:// as the new path for public downloading, however, there is no https file size calculation exists, and there is error thrown.
Is there any workaround or there is an implementation required?
Hey @zhuol at the moment you'd have to download the files and then change the paths to be local file paths.
We might be able to support streaming from the HTTPS URLs, but it depends if CloudFront (R2) allows range requests. This is worth investigating.
Can you please describe which datasets I need to download for pre-training. Where to put these files, what is the directory structure for storing the files and how to modify the path in the config file. Thank you for the help.
I apologize for our delay in response. In order to help surface current, unresolved issues, we are closing tickets prior to February 29. Please reopen your ticket if you are continuing to experience this issue. Thank you!