Reaching server limits in olmo-data.org?
🐛 Describe the bug
When trying to train olmo2-1B from a checkpoint, I've begun to see very poor/inconsistent connection to olmo-data.org in the last few weeks.
I wasn't sure if it was a rate-limit from my own cluster, but even by reducing num_workers, I continue to get these network errors. It does not consistently occur at specific data paths; often times when I see an error, I will try to curl the path from a different computer and get an error for a few seconds (when suddenly I am able to reach the link again).
This issue might be related to the slow dataloader loading speed reported in #869 and #864
Has the server been facing a lot of requests recently? Or do you have another idea as to what might be causing this issue?
CRITICAL Uncaught ConnectionError: Caught ConnectionError in DataLoader worker process 0. Original Traceback (most recent call last):
File "<python_lib_path>/urllib3/connection.py", line 198, in _new_conn
sock = connection.create_connection(
File "<python_lib_path>/urllib3/util/connection.py", line 85, in create_connection
raise err
File "<python_lib_path>/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
OSError: [Errno 101] Network is unreachable
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<python_lib_path>/urllib3/connectionpool.py", line 787, in urlopen
response = self._make_request(
File "<python_lib_path>/urllib3/connectionpool.py", line 488, in _make_request
raise new_e
File "<python_lib_path>/urllib3/connectionpool.py", line 464, in _make_request
self._validate_conn(conn)
File "<python_lib_path>/urllib3/connectionpool.py", line 1093, in _validate_conn
conn.connect()
File "<python_lib_path>/urllib3/connection.py", line 753, in connect
self.sock = sock = self._new_conn()
File "<python_lib_path>/urllib3/connection.py", line 213, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object>: Failed to establish a new connection: [Errno 101] Network is unreachable
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<python_lib_path>/requests/adapters.py", line 667, in send
resp = conn.urlopen(
File "<python_lib_path>/urllib3/connectionpool.py", line 841, in urlopen
retries = retries.increment(
File "<python_lib_path>/urllib3/util/retry.py", line 519, in increment
raise MaxRetryError(_pool, url, reason) from reason
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='olmo-data.org', port=443): Max retries exceeded with url: /preprocessed/dclm/.../part-075-00001.npy
(Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object>: Failed to establish a new connection: [Errno 101] Network is unreachable'))
Versions
Python 3.10.12 ai2-olmo==0.6.0 ai2-olmo-core==0.1.0 aiohappyeyeballs==2.6.1 aiohttp==3.12.15 aiosignal==1.4.0 annotated-types==0.7.0 antlr4-python3-runtime==4.9.3 apt-clone==0.2.1 asttokens==3.0.0 async-timeout==5.0.1 attrs==21.2.0 Automat==20.2.0 Babel==2.8.0 bcrypt==3.2.0 blinker==1.4 boto3==1.40.3 botocore==1.40.3 cached_path==1.7.3 cachetools==5.5.2 certifi==2020.6.20 chardet==4.0.0 charset-normalizer==3.4.2 click==8.0.3 cloud-init==25.1.2 colorama==0.4.4 comm==0.2.3 command-not-found==0.3 configobj==5.0.6 constantly==15.1.0 contourpy==1.3.2 cryptography==3.4.8 cycler==0.12.1 datasets==4.0.0 dbus-python==1.2.18 debugpy==1.8.15 decorator==5.2.1 diceware==0.9.6 dill==0.3.8 distro==1.7.0 distro-info==1.1+ubuntu0.2 dnspython==2.1.0 einops==0.8.1 exceptiongroup==1.3.0 executing==2.2.0 filelock==3.18.0 flash_attn==2.8.2 fonttools==4.59.2 frozenlist==1.7.0 fsspec==2025.3.0 gitdb==4.0.12 GitPython==3.1.45 google-api-core==2.25.1 google-auth==2.40.3 google-cloud-core==2.4.3 google-cloud-storage==2.19.0 google-crc32c==1.7.1 google-resumable-media==2.7.2 googleapis-common-protos==1.70.0 gpg==1.16.0 greenlet==1.1.2 gyp==0.1 hf-xet==1.1.7 httplib2==0.20.2 huggingface-hub==0.34.3 hyperlink==21.0.0 idna==3.3 importlib-metadata==4.6.4 importlib_resources==6.5.2 incremental==21.3.0 ipykernel==6.30.1 ipython==8.37.0 jedi==0.19.2 jeepney==0.7.1 Jinja2==3.0.3 jmespath==1.0.1 joblib==1.5.1 jsonpatch==1.32 jsonpointer==2.0 jsonschema==3.2.0 jupyter_client==8.6.3 jupyter_core==5.8.1 keyring==23.5.0 kiwisolver==1.4.9 launchpadlib==1.10.16 lazr.restfulclient==0.14.4 lazr.uri==1.0.6 lightning-utilities==0.15.1 Markdown==3.3.6 markdown-it-py==3.0.0 MarkupSafe==2.0.1 matplotlib==3.10.6 matplotlib-inline==0.1.7 mdurl==0.1.2 mercurial==6.1.1 more-itertools==8.10.0 mpmath==1.3.0 msgpack==1.0.3 multidict==6.6.3 multiprocess==0.70.16 nest-asyncio==1.6.0 netifaces==0.11.0 networkx==3.4.2 numpy==2.2.6 nvidia-cublas-cu12==12.6.4.1 nvidia-cuda-cupti-cu12==12.6.80 nvidia-cuda-nvrtc-cu12==12.6.77 nvidia-cuda-runtime-cu12==12.6.77 nvidia-cudnn-cu12==9.5.1.17 nvidia-cufft-cu12==11.3.0.4 nvidia-cufile-cu12==1.11.1.6 nvidia-curand-cu12==10.3.7.77 nvidia-cusolver-cu12==11.7.1.2 nvidia-cusparse-cu12==12.5.4.2 nvidia-cusparselt-cu12==0.6.3 nvidia-nccl-cu12==2.26.2 nvidia-nvjitlink-cu12==12.6.85 nvidia-nvtx-cu12==12.6.77 oauthlib==3.2.0 omegaconf==2.3.0 packaging==25.0 pandas==2.3.1 parso==0.8.4 pexpect==4.8.0 pillow==11.3.0 platformdirs==4.3.8 prompt_toolkit==3.0.51 propcache==0.3.2 proto-plus==1.26.1 protobuf==6.31.1 psutil==7.0.0 ptyprocess==0.7.0 pure_eval==0.2.3 pyarrow==21.0.0 pyasn1==0.4.8 pyasn1-modules==0.2.1 pydantic==2.11.7 pydantic_core==2.33.2 Pygments==2.19.2 PyGObject==3.42.1 PyHamcrest==2.0.2 PyJWT==2.3.0 pynvim==0.4.2 pyOpenSSL==21.0.0 pyparsing==2.4.7 pyrsistent==0.18.1 pyserial==3.5 python-apt==2.4.0+ubuntu4 python-dateutil==2.9.0.post0 python-debian==0.1.43+ubuntu1.1 python-magic==0.4.24 pytz==2022.1 PyYAML==5.4.1 pyzmq==27.0.1 regex==2025.7.34 requests==2.32.4 requests-toolbelt==0.9.1 rich==13.9.4 rsa==4.9.1 s3transfer==0.13.1 safetensors==0.6.1 scikit-learn==1.7.1 scipy==1.15.3 screen-resolution-extra==0.0.0 SecretStorage==3.3.1 sentry-sdk==2.34.1 service-identity==18.1.0 six==1.16.0 smmap==5.0.2 sos==4.8.2 ssh-import-id==5.11 stack-data==0.6.3 sympy==1.14.0 systemd-python==234 threadpoolctl==3.6.0 tokenizers==0.21.4 torch==2.7.1 torchaudio==2.7.1 torchmetrics==1.8.0 torchvision==0.22.1 tornado==6.5.1 tqdm==4.67.1 traitlets==5.14.3 transformers==4.55.0 triton==3.3.1 Twisted==22.1.0 typing-inspection==0.4.1 typing_extensions==4.14.1 tzdata==2025.2 ubuntu-drivers-common==0.0.0 ubuntu-pro-client==8001 ufw==0.36.1 unattended-upgrades==0.1 urllib3==2.5.0 vboxapi==1.0 wadllib==1.3.6 wandb==0.21.0 wcwidth==0.2.13 xkit==0.0.0 xxhash==3.5.0 yarl==1.20.1 zipp==1.0.0 zope.interface==5.4.0