olmocr icon indicating copy to clipboard operation
olmocr copied to clipboard

Page proccesing loops endlessly.

Open GasperNLP opened this issue 9 months ago • 0 comments

🐛 Describe the bug

The installation looks fine. Everything seems to be working, except when it has to do the actual processing of a page, it just loops. I left it running for two hours.

I am runing it in singularity container on a cluster.

INFO:olmocr.check:pdftoppm is installed and working. 2025-03-08 16:14:34,704 - main - INFO - Got --pdfs argument, going to add to the work queue 2025-03-08 16:14:34,724 - main - INFO - Loading file at ./tekmovanje_dmfa/MaSSA_Drzavno_2002.pdf as PDF document 2025-03-08 16:14:34,725 - main - INFO - Found 1 total pdf paths to add ^MSampling PDFs to calculate optimal length: 0%| | 0/1 [00:00<?, ?it/s]^MSampling PDFs to calculate optimal length: 100%|██████████| 1/1 [00:00<00:00, 297.70it/s] 2025-03-08 16:14:34,729 - main - INFO - Calculated items_per_group: 62 based on average pages per PDF: 8.00 INFO:olmocr.work_queue:Found 1 total paths INFO:olmocr.work_queue:0 new paths to add to the workspace 2025-03-08 16:14:34,942 - main - INFO - Starting pipeline with PID 2954079 INFO:olmocr.work_queue:Initialized local queue with 2 work items 2025-03-08 16:14:35,718 - main - WARNING - Attempt 1: All connection attempts failed 2025-03-08 16:14:36,730 - main - WARNING - Attempt 2: All connection attempts failed 2025-03-08 16:14:37,746 - main - WARNING - Attempt 3: All connection attempts failed 2025-03-08 16:14:38,761 - main - WARNING - Attempt 4: All connection attempts failed 2025-03-08 16:14:39,777 - main - WARNING - Attempt 5: All connection attempts failed 2025-03-08 16:14:40,792 - main - WARNING - Attempt 6: All connection attempts failed 2025-03-08 16:14:41,809 - main - WARNING - Attempt 7: All connection attempts failed 2025-03-08 16:14:42,827 - main - WARNING - Attempt 8: All connection attempts failed 2025-03-08 16:14:43,850 - main - WARNING - Attempt 9: All connection attempts failed 2025-03-08 16:14:44,868 - main - WARNING - Attempt 10: All connection attempts failed 2025-03-08 16:14:45,882 - main - WARNING - Attempt 11: All connection attempts failed 2025-03-08 16:14:46,897 - main - WARNING - Attempt 12: All connection attempts failed 2025-03-08 16:14:47,677 - main - INFO - [2025-03-08 16:14:47] server_args=ServerArgs(model_path='/ceph/hpc/data/s24o01-42-users/models/hf_models/olmOCR-7B-0225-preview', tokenizer$ 2025-03-08 16:14:47,914 - main - WARNING - Attempt 13: All connection attempts failed 2025-03-08 16:14:48,944 - main - WARNING - Attempt 14: All connection attempts failed 2025-03-08 16:14:49,649 - main - INFO - [2025-03-08 16:14:49] Use chat template for the OpenAI-compatible API server: qwen2-vl 2025-03-08 16:14:49,985 - main - WARNING - Attempt 15: All connection attempts failed 2025-03-08 16:14:51,026 - main - WARNING - Attempt 16: All connection attempts failed 2025-03-08 16:14:52,053 - main - WARNING - Attempt 17: All connection attempts failed 2025-03-08 16:14:53,085 - main - WARNING - Attempt 18: All connection attempts failed 2025-03-08 16:14:54,126 - main - WARNING - Attempt 19: All connection attempts failed 2025-03-08 16:14:55,157 - main - WARNING - Attempt 20: All connection attempts failed 2025-03-08 16:14:56,183 - main - WARNING - Attempt 21: All connection attempts failed 2025-03-08 16:14:57,201 - main - WARNING - Attempt 22: All connection attempts failed 2025-03-08 16:14:58,229 - main - WARNING - Attempt 23: All connection attempts failed 2025-03-08 16:14:59,257 - main - WARNING - Attempt 24: All connection attempts failed 2025-03-08 16:15:00,284 - main - WARNING - Attempt 25: All connection attempts failed 2025-03-08 16:15:01,302 - main - WARNING - Attempt 26: All connection attempts failed 2025-03-08 16:15:01,928 - main - INFO - [2025-03-08 16:15:01 TP0] Overlap scheduler is disabled for multimodal models. 2025-03-08 16:15:01,931 - main - INFO - [2025-03-08 16:15:01 TP0] Automatically reduce --mem-fraction-static to 0.760 because this is a multimodal model. 2025-03-08 16:15:01,931 - main - INFO - [2025-03-08 16:15:01 TP0] Automatically turn off --chunked-prefill-size and disable radix cache for qwen2-vl. 2025-03-08 16:15:01,931 - main - INFO - [2025-03-08 16:15:01 TP0] Init torch distributed begin. 2025-03-08 16:15:02,315 - main - INFO - [2025-03-08 16:15:02 TP0] Load weight begin. avail mem=39.08 GB 2025-03-08 16:15:02,316 - main - WARNING - Attempt 27: All connection attempts failed 2025-03-08 16:15:43,324 - main - INFO - [2025-03-08 16:15:43 TP0] Load weight end. type=Qwen2VLForConditionalGeneration, dtype=torch.bfloat16, avail mem=23.35 GB 2025-03-08 16:15:43,858 - main - WARNING - Attempt 68: All connection attempts failed 2025-03-08 16:15:45,062 - main - WARNING - Attempt 69: All connection attempts failed 2025-03-08 16:15:45,755 - main - INFO - [2025-03-08 16:15:45 TP0] KV Cache is allocated. K size: 6.98 GB, V size: 6.98 GB. 2025-03-08 16:15:45,755 - main - INFO - [2025-03-08 16:15:45 TP0] Memory pool end. avail mem=8.85 GB 2025-03-08 16:15:46,078 - main - WARNING - Attempt 70: All connection attempts failed 2025-03-08 16:15:46,119 - main - INFO - [2025-03-08 16:15:46 TP0] Capture cuda graph begin. This can take up to several minutes. 2025-03-08 16:15:47,092 - main - WARNING - Attempt 71: All connection attempts failed 2025-03-08 16:15:48,109 - main - WARNING - Attempt 72: All connection attempts failed 2025-03-08 16:15:49,124 - main - WARNING - Attempt 73: All connection attempts failed 2025-03-08 16:15:50,139 - main - WARNING - Attempt 74: All connection attempts failed 2025-03-08 16:15:51,155 - main - WARNING - Attempt 75: All connection attempts failed 2025-03-08 16:15:52,170 - main - WARNING - Attempt 76: All connection attempts failed 2025-03-08 16:15:53,185 - main - WARNING - Attempt 77: All connection attempts failed 2025-03-08 16:15:54,199 - main - WARNING - Attempt 78: All connection attempts failed 2025-03-08 16:15:54,849 - main - INFO - ^M 0%| | 0/23 [00:00<?, ?it/s]^M 4%|▍ | 1/23 [00:01<00:41, 1.90s/it]^M 9%|▊ | 2/23 [00:02<00:20, 1.04it/s]^M $ 2025-03-08 16:15:54,850 - main - INFO - [2025-03-08 16:15:54 TP0] Capture cuda graph end. Time elapsed: 8.73 s 2025-03-08 16:15:55,215 - main - WARNING - Attempt 79: All connection attempts failed 2025-03-08 16:15:55,881 - main - INFO - [2025-03-08 16:15:55 TP0] max_total_num_tokens=261512, chunked_prefill_size=-1, max_prefill_tokens=16384, max_running_requests=4087, contex$ 2025-03-08 16:15:56,241 - main - WARNING - Attempt 80: All connection attempts failed 2025-03-08 16:15:57,260 - main - WARNING - Attempt 81: All connection attempts failed INFO:httpx:HTTP Request: GET http://localhost:30024/v1/models "HTTP/1.1 200 OK" 2025-03-08 16:15:58,294 - main - INFO - sglang server is ready. 2025-03-08 16:15:58,294 - main - INFO - Queue remaining: 2 2025-03-08 16:15:58,294 - main - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec)

2025-03-08 16:15:58,294 - main - INFO - Worker ID

2025-03-08 16:15:58,296 - main - INFO - Worker 0 processing work item 2edc697a5b9d33ecc19efde064271b9e15c149ec 2025-03-08 16:15:58,296 - main - INFO - Created all tasks for 2edc697a5b9d33ecc19efde064271b9e15c149ec 2025-03-08 16:15:58,304 - main - INFO - Got 8 pages to do for ./tekmovanje_dmfa/MaSSA_Drzavno_2002.pdf in worker 0 2025-03-08 16:15:58,600 - main - INFO - [2025-03-08 16:15:58 TP0] Prefill batch. #new-seq: 1, #new-token: 6, #cached-token: 0, cache hit rate: 0.00%, token usage: 0.00, #running-r$ 2025-03-08 16:15:58,600 - main - INFO - sglang running req: 0 queue req: 0

Worker ID | started ----------+-------- 0 | 8 2025-03-08 16:17:08,416 - main - INFO - Queue remaining: 1 2025-03-08 16:17:08,416 - main - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec)

2025-03-08 16:17:08,416 - main - INFO - Worker ID | started ----------+-------- 0 | 8 2025-03-08 16:17:18,417 - main - INFO - Queue remaining: 1 2025-03-08 16:17:18,417 - main - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec)

2025-03-08 16:17:18,418 - main - INFO - Worker ID | started ----------+-------- 0 | 8 2025-03-08 16:17:28,418 - main - INFO - Queue remaining: 1 2025-03-08 16:17:28,418 - main - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec)

2025-03-08 16:17:28,419 - main - INFO - Worker ID | started ----------+-------- 0 | 8 2025-03-08 16:17:38,419 - main - INFO - Queue remaining: 1 2025-03-08 16:17:38,419 - main - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec)

2025-03-08 16:17:38,420 - main - INFO - Worker ID | started ----------+-------- 0 | 8

Versions

Python 3.11.11 aiohappyeyeballs==2.5.0 aiohttp==3.11.13 aiosignal==1.3.2 annotated-types==0.7.0 anthropic==0.49.0 anyio==4.8.0 archspec @ file:///home/conda/feedstock_root/build_artifacts/archspec_1737352602016/work asttokens @ file:///home/conda/feedstock_root/build_artifacts/asttokens_1733250440834/work astunparse==1.6.3 attrs @ file:///home/conda/feedstock_root/build_artifacts/attrs_1737819173731/work beaker-py==1.34.1 beautifulsoup4 @ file:///home/conda/feedstock_root/build_artifacts/beautifulsoup4_1733230845337/work bleach==6.2.0 boltons @ file:///home/conda/feedstock_root/build_artifacts/boltons_1733827268945/work boto3==1.37.9 botocore==1.37.9 Brotli @ file:///home/conda/feedstock_root/build_artifacts/brotli-split_1725267488082/work cached_path==1.6.7 cachetools==5.5.1 certifi @ file:///home/conda/feedstock_root/build_artifacts/certifi_1734380492396/work/certifi cffi @ file:///home/conda/feedstock_root/build_artifacts/cffi_1725560564262/work chardet @ file:///home/conda/feedstock_root/build_artifacts/chardet_1724954797915/work charset-normalizer @ file:///home/conda/feedstock_root/build_artifacts/charset-normalizer_1735929714516/work click @ file:///home/conda/feedstock_root/build_artifacts/click_1734858813237/work cloudpickle==3.1.1 cmake==3.31.4 colorama @ file:///home/conda/feedstock_root/build_artifacts/colorama_1733218098505/work compressed-tensors==0.8.0 conda @ file:///home/conda/feedstock_root/build_artifacts/conda_1737546925511/work conda-build @ file:///home/conda/feedstock_root/build_artifacts/conda-build_1737106456198/work conda-libmamba-solver @ file:///home/conda/feedstock_root/build_artifacts/conda-libmamba-solver_1737800978214/work/src conda-package-handling @ file:///home/conda/feedstock_root/build_artifacts/conda-package-handling_1736345463896/work conda_index @ file:///home/conda/feedstock_root/build_artifacts/conda-index_1718383105992/work conda_package_streaming @ file:///home/conda/feedstock_root/build_artifacts/conda-package-streaming_1729004031731/work cryptography==44.0.2 cuda-bindings==12.8.0 cuda-python==12.8.0 datasets==3.3.2 decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1733236420667/work decord==0.6.0 dill==0.3.8 diskcache==5.6.3 distro @ file:///home/conda/feedstock_root/build_artifacts/distro_1734729835256/work dnspython==2.7.0 docker==7.1.0 einops==0.8.1 exceptiongroup @ file:///home/conda/feedstock_root/build_artifacts/exceptiongroup_1733208806608/work executing @ file:///home/conda/feedstock_root/build_artifacts/executing_1733569351617/work expecttest==0.3.0 fastapi==0.115.11 filelock @ file:///home/conda/feedstock_root/build_artifacts/filelock_1737517818712/work filetype==1.2.0 flashinfer==0.1.6+cu124torch2.4 frozendict @ file:///home/conda/feedstock_root/build_artifacts/frozendict_1728841334936/work frozenlist==1.5.0 fsspec==2024.12.0 ftfy==6.3.1 gguf==0.10.0 google-api-core==2.24.1 google-auth==2.38.0 google-cloud-core==2.4.2 google-cloud-storage==2.19.0 google-crc32c==1.6.0 google-genai==1.2.0 google-resumable-media==2.7.2 googleapis-common-protos==1.69.1 h11==0.14.0 h2 @ file:///home/conda/feedstock_root/build_artifacts/h2_1733298745555/work hf_transfer==0.1.9 hpack @ file:///home/conda/feedstock_root/build_artifacts/hpack_1733299205993/work httpcore==1.0.7 httptools==0.6.4 httpx==0.28.1 huggingface-hub==0.28.1 hyperframe @ file:///home/conda/feedstock_root/build_artifacts/hyperframe_1733298771451/work hypothesis==6.124.7 idna @ file:///home/conda/feedstock_root/build_artifacts/idna_1733211830134/work importlib_metadata==8.6.1 importlib_resources @ file:///home/conda/feedstock_root/build_artifacts/importlib_resources_1736252299705/work interegular==0.3.3 ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1734788142186/work jedi @ file:///home/conda/feedstock_root/build_artifacts/jedi_1733300866624/work Jinja2 @ file:///home/conda/feedstock_root/build_artifacts/jinja2_1734823942230/work jiter==0.8.2 jmespath==1.0.1 joblib==1.4.2 jsonpatch @ file:///home/conda/feedstock_root/build_artifacts/jsonpatch_1733814567314/work jsonpointer @ file:///home/conda/feedstock_root/build_artifacts/jsonpointer_1725302941992/work jsonschema @ file:///home/conda/feedstock_root/build_artifacts/jsonschema_1733472696581/work jsonschema-specifications @ file:///tmp/tmpk0f344m9/src lark==1.2.2 libarchive-c @ file:///home/conda/feedstock_root/build_artifacts/python-libarchive-c_1725302626023/work libmambapy @ file:///home/conda/feedstock_root/build_artifacts/mamba-split_1735806506118/work/libmambapy lief @ file:///home/conda/feedstock_root/build_artifacts/lief_1726040283347/work/api/python lingua-language-detector==2.0.2 lintrunner==0.12.7 litellm==1.63.3 llvmlite==0.44.0 lm-format-enforcer==0.10.11 markdown-it-py==3.0.0 markdown2==2.5.3 markdownify==0.13.1 marker-pdf==1.5.2 MarkupSafe @ file:///home/conda/feedstock_root/build_artifacts/markupsafe_1733219680183/work matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1733416936468/work mdurl==0.1.2 menuinst @ file:///home/conda/feedstock_root/build_artifacts/menuinst_1731146975675/work mistral_common==1.5.3 modelscope==1.23.2 more-itertools @ file:///home/conda/feedstock_root/build_artifacts/more-itertools_1736883817510/work mpmath==1.3.0 msgpack==1.1.0 msgspec==0.19.0 multidict==6.1.0 multiprocess==0.70.16 nest-asyncio==1.6.0 networkx==3.4.2 ninja==1.11.1.3 numba==0.61.0 numpy==1.26.4 nvidia-cublas-cu11==11.11.3.6 nvidia-cublas-cu12==12.4.5.8 nvidia-cuda-cupti-cu11==11.8.87 nvidia-cuda-cupti-cu12==12.4.127 nvidia-cuda-nvrtc-cu11==11.8.89 nvidia-cuda-nvrtc-cu12==12.4.127 nvidia-cuda-runtime-cu11==11.8.89 nvidia-cuda-runtime-cu12==12.4.127 nvidia-cudnn-cu11==9.1.0.70 nvidia-cudnn-cu12==9.1.0.70 nvidia-cufft-cu11==10.9.0.58 nvidia-cufft-cu12==11.2.1.3 nvidia-curand-cu11==10.3.0.86 nvidia-curand-cu12==10.3.5.147 nvidia-cusolver-cu11==11.4.1.48 nvidia-cusolver-cu12==11.6.1.9 nvidia-cusparse-cu11==11.7.5.86 nvidia-cusparse-cu12==12.3.1.170 nvidia-ml-py==12.570.86 nvidia-nccl-cu11==2.21.5 nvidia-nccl-cu12==2.21.5 nvidia-nvjitlink-cu12==12.4.127 nvidia-nvtx-cu11==11.8.86 nvidia-nvtx-cu12==12.4.127 olmocr==0.1.58 openai==1.65.4 opencv-python==4.11.0.86 opencv-python-headless==4.11.0.86 optree==0.14.0 orjson==3.10.15 outlines==0.0.46 packaging @ file:///home/conda/feedstock_root/build_artifacts/packaging_1733203243479/work pandas==2.2.3 parso @ file:///home/conda/feedstock_root/build_artifacts/parso_1733271261340/work partial-json-parser==0.2.1.1.post5 pdftext==0.5.1 pexpect @ file:///home/conda/feedstock_root/build_artifacts/pexpect_1733301927746/work pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1733327343728/work pillow==10.4.0 pkginfo @ file:///home/conda/feedstock_root/build_artifacts/pkginfo_1733734533957/work pkgutil_resolve_name @ file:///home/conda/feedstock_root/build_artifacts/pkgutil-resolve-name_1733344503739/work platformdirs @ file:///home/conda/feedstock_root/build_artifacts/platformdirs_1733232627818/work pluggy @ file:///home/conda/feedstock_root/build_artifacts/pluggy_1733222765875/work prometheus-fastapi-instrumentator==7.0.2 prometheus_client==0.21.1 prompt_toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1737453357274/work propcache==0.3.0 proto-plus==1.26.0 protobuf==5.29.3 psutil @ file:///home/conda/feedstock_root/build_artifacts/psutil_1735327328223/work ptyprocess @ file:///home/conda/feedstock_root/build_artifacts/ptyprocess_1733302279685/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl#sha256=92c32ff62b5fd8cf325bec5ab90d7be3d2a8ca8c8a3813ff487a8d2002630d1f pure_eval @ file:///home/conda/feedstock_root/build_artifacts/pure_eval_1733569405015/work py-cpuinfo==9.0.0 pyairports==2.1.1 pyarrow==19.0.1 pyasn1==0.6.1 pyasn1_modules==0.4.1 pybase64==1.4.1 pycosat @ file:///home/conda/feedstock_root/build_artifacts/pycosat_1732588400443/work pycountry==24.6.1 pycparser @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_pycparser_1733195786/work pydantic==2.10.6 pydantic-settings==2.7.1 pydantic_core==2.27.2 Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1736243443484/work pypdf==5.3.1 pypdfium2==4.30.0 PySocks @ file:///home/conda/feedstock_root/build_artifacts/pysocks_1733217236728/work python-dateutil==2.9.0.post0 python-dotenv==1.0.1 python-etcd==0.4.5 python-multipart==0.0.20 pytz @ file:///home/conda/feedstock_root/build_artifacts/pytz_1733215667876/work PyYAML @ file:///home/conda/feedstock_root/build_artifacts/pyyaml_1737454647378/work pyzmq==26.2.1 RapidFuzz==3.12.1 ray==2.43.0 referencing @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_referencing_1737836872/work regex==2024.11.6 requests @ file:///home/conda/feedstock_root/build_artifacts/requests_1733217035951/work rich==13.9.4 rpds-py @ file:///home/conda/feedstock_root/build_artifacts/rpds-py_1733366613949/work rsa==4.9 ruamel.yaml @ file:///home/conda/feedstock_root/build_artifacts/ruamel.yaml_1736248028599/work ruamel.yaml.clib @ file:///home/conda/feedstock_root/build_artifacts/ruamel.yaml.clib_1728724459810/work s3transfer==0.11.4 safetensors==0.5.2 scikit-learn==1.6.1 scipy==1.15.1 sentencepiece==0.2.0 setproctitle==1.3.5 sgl-kernel==0.0.3.post1 sglang==0.4.2 six==1.17.0 smart-open==7.1.0 sniffio==1.3.1 sortedcontainers==2.4.0 soupsieve @ file:///home/conda/feedstock_root/build_artifacts/soupsieve_1693929250441/work stack_data @ file:///home/conda/feedstock_root/build_artifacts/stack_data_1733569443808/work starlette==0.46.1 surya-ocr==0.11.1 sympy==1.13.1 threadpoolctl==3.5.0 tiktoken==0.9.0 tokenizers==0.21.0 torch==2.5.1 torchao==0.9.0 torchaudio==2.6.0+cu118 torchelastic==0.2.2 torchvision==0.20.1 tqdm @ file:///home/conda/feedstock_root/build_artifacts/tqdm_1735661334605/work traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1733367359838/work transformers==4.48.3 triton==3.1.0 truststore @ file:///home/conda/feedstock_root/build_artifacts/truststore_1729762363021/work types-dataclasses==0.6.6 typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1733188668063/work tzdata==2025.1 urllib3 @ file:///home/conda/feedstock_root/build_artifacts/urllib3_1734859416348/work uvicorn==0.34.0 uvloop==0.21.0 vllm==0.6.4.post1 watchfiles==1.0.4 wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1733231326287/work webencodings==0.5.1 websockets==14.2 wrapt==1.17.2 xformers==0.0.28.post3 xgrammar==0.1.15 xxhash==3.5.0 yarl==1.18.3 zipp @ file:///home/conda/feedstock_root/build_artifacts/zipp_1732827521216/work zstandard==0.23.0

GasperNLP avatar Mar 08 '25 15:03 GasperNLP