Muliple PDFs Error, [Errno 110] Connect call failed
🐛 Describe the bug
I keep countering errors when converting multiple PDFs
WARNING - Client error on attempt 0 for /path-to-pdf/xxx.pdf-1782: <class 'TimeoutError'> [Errno 110] Connect call failed ('127.0.0.1', 31824)
which causes me only around 15% of the PDFs can be converted successfully
Versions
Python 3.11.11 aiohappyeyeballs==2.6.1 aiohttp==3.11.14 aiosignal==1.3.2 annotated-types==0.7.0 anthropic==0.49.0 anyio==4.9.0 asttokens==3.0.0 attrs==25.3.0 beaker-py==1.34.1 bleach==6.2.0 boto3==1.37.14 botocore==1.37.14 cached_path==1.7.1 cachetools==5.5.2 certifi==2025.1.31 cffi==1.17.1 charset-normalizer==3.4.1 click==8.1.8 cloudpickle==3.1.1 compressed-tensors==0.8.0 cryptography==44.0.2 cuda-bindings==12.8.0 cuda-python==12.8.0 datasets==3.4.1 decorator==5.2.1 decord==0.6.0 dill==0.3.8 diskcache==5.6.3 distro==1.9.0 docker==7.1.0 einops==0.8.1 executing==2.2.0 fastapi==0.115.11 filelock==3.18.0 flashinfer==0.1.6+cu124torch2.4 frozenlist==1.5.0 fsspec==2024.12.0 ftfy==6.3.1 gguf==0.10.0 google-api-core==2.24.2 google-auth==2.38.0 google-cloud-core==2.4.3 google-cloud-storage==2.19.0 google-crc32c==1.7.0 google-resumable-media==2.7.2 googleapis-common-protos==1.69.2 h11==0.14.0 hf_transfer==0.1.9 httpcore==1.0.7 httptools==0.6.4 httpx==0.28.1 huggingface-hub==0.27.1 idna==3.10 importlib_metadata==8.6.1 interegular==0.3.3 ipython==9.0.2 ipython_pygments_lexers==1.1.1 jedi==0.19.2 Jinja2==3.1.6 jiter==0.9.0 jmespath==1.0.1 jsonschema==4.23.0 jsonschema-specifications==2024.10.1 lark==1.2.2 lingua-language-detector==2.0.2 litellm==1.63.11 llvmlite==0.44.0 lm-format-enforcer==0.10.11 markdown-it-py==3.0.0 markdown2==2.5.3 MarkupSafe==3.0.2 matplotlib-inline==0.1.7 mdurl==0.1.2 mistral_common==1.5.4 modelscope==1.23.2 mpmath==1.3.0 msgpack==1.1.0 msgspec==0.19.0 multidict==6.2.0 multiprocess==0.70.16 nest-asyncio==1.6.0 networkx==3.4.2 ninja==1.11.1.3 numba==0.61.0 numpy==1.26.4 nvidia-cublas-cu12==12.4.5.8 nvidia-cuda-cupti-cu12==12.4.127 nvidia-cuda-nvrtc-cu12==12.4.127 nvidia-cuda-runtime-cu12==12.4.127 nvidia-cudnn-cu12==9.1.0.70 nvidia-cufft-cu12==11.2.1.3 nvidia-curand-cu12==10.3.5.147 nvidia-cusolver-cu12==11.6.1.9 nvidia-cusparse-cu12==12.3.1.170 nvidia-ml-py==12.570.86 nvidia-nccl-cu12==2.21.5 nvidia-nvjitlink-cu12==12.4.127 nvidia-nvtx-cu12==12.4.127 -e git+https://github.com/allenai/olmocr.git@3c22cf3430467a4cd3683dfab2652089f0e7a4ce#egg=olmocr openai==1.66.3 opencv-python-headless==4.11.0.86 orjson==3.10.15 outlines==0.0.46 packaging==24.2 pandas==2.2.3 parso==0.8.4 partial-json-parser==0.2.1.1.post5 pexpect==4.9.0 pillow==11.1.0 prometheus-fastapi-instrumentator==7.0.2 prometheus_client==0.21.1 prompt_toolkit==3.0.50 propcache==0.3.0 proto-plus==1.26.1 protobuf==6.30.1 psutil==7.0.0 ptyprocess==0.7.0 pure_eval==0.2.3 py-cpuinfo==9.0.0 pyairports==2.1.1 pyarrow==19.0.1 pyasn1==0.6.1 pyasn1_modules==0.4.1 pycountry==24.6.1 pycparser==2.22 pydantic==2.10.6 pydantic_core==2.27.2 Pygments==2.19.1 pypdf==5.4.0 pypdfium2==4.30.1 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 python-multipart==0.0.20 pytz==2025.1 PyYAML==6.0.2 pyzmq==26.3.0 ray==2.43.0 referencing==0.36.2 regex==2024.11.6 requests==2.32.3 rich==13.9.4 rpds-py==0.23.1 rsa==4.9 s3transfer==0.11.4 safetensors==0.5.3 sentencepiece==0.2.0 setproctitle==1.3.5 sgl-kernel==0.0.3.post1 sglang==0.4.2 six==1.17.0 smart-open==7.1.0 sniffio==1.3.1 stack-data==0.6.3 starlette==0.46.1 sympy==1.13.1 tiktoken==0.9.0 tokenizers==0.20.3 torch==2.5.1 torchao==0.9.0 torchvision==0.20.1 tqdm==4.67.1 traitlets==5.14.3 transformers==4.46.2 triton==3.1.0 typing_extensions==4.12.2 tzdata==2025.1 urllib3==2.3.0 uvicorn==0.34.0 uvloop==0.21.0 vllm==0.6.4.post1 watchfiles==1.0.4 wcwidth==0.2.13 webencodings==0.5.1 websockets==15.0.1 wrapt==1.17.2 xformers==0.0.28.post3 xgrammar==0.1.16 xxhash==3.5.0 yarl==1.18.3 zipp==3.21.0 zstandard==0.23.0
If no one is working on this, I would love to contribute and fix this
If no one is working on this, I would love to contribute and fix this
Please go ahead and let me know if you want to discuss.
@jakep-allenai could you check if this is a bug or it is normal
@SkaarFacee If you have an idea, then please let me know.
Interesting, it's definitely not normal. What sort of host are you running this on, is it within docker? What is your networking setup?
It should be starting Sglang for you on port 30024, so 31824 is weird.
@jakep-allenai I am running several olmocr.pipelines in one node so the port 30024 is occupied. That is why I used a different one instead. Does the port number matter? Please note that I also have the same issue with port 30024!
I am using all default setting in SLURM/LSF multiple GPU nodes with Ubuntu. I tried both slurm and LSF.
Hmm, interesting, when we run multi-gpus, we would run it as 8 separate docker containers on one host for example. Does the error go away if you run just 1 GPU at a time?
Wow! Good idea. When I run single GPU, the error still exists...
https://github.com/allenai/olmocr/blob/3edae0ac7110efb735d39a7cc699847e76d92114/olmocr/pipeline.py#L514-L535
Hmm, what if you change this code to add "--host", "0.0.0.0" to the array to specify binding to all hosts?
Okay, I can take a look and see what is going on. Please gimme a day or so
Ahh sorry. I got a bit busy the last few weeks. @xcvil Can you help me replicate this? I am not sure how to go about with that
Ahh sorry. I got a bit busy the last few weeks. @xcvil Can you help me replicate this? I am not sure how to go about with that
he meant to change the code to this so that, the server will listen on all network interfaces, making it accessible from external machines (if firewall and network settings allow it)
in this file: olmocr/olmocr/pipeline.py
cmd = [
"python3",
"-m",
"sglang.launch_server",
"--model-path",
model_name_or_path,
"--chat-template",
args.model_chat_template,
# "--context-length", str(args.model_max_context), # Commented out due to crashes
"--port",
str(SGLANG_SERVER_PORT),
"--log-level-http",
"warning",
"--host", # New line
"0.0.0.0", # New line
]
cmd.extend(mem_fraction_arg)
proc = await asyncio.create_subprocess_exec(
*cmd,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
Closing this issue for now, feel free to reopen if you want to discuss more on this.