olmocr
olmocr copied to clipboard
Toolkit for linearizing PDFs for LLM datasets/training
### π Describe the bug I am trying to install olmOCR on mac and I get this error: ERROR: pip's dependency resolver does not currently take into account all the...
### π The feature, motivation and pitch When converting PDFs to text using OCR, the tool has difficulty identifying and distinguishing headings, such as H1 (main headings) and H2 (subheadings)....
### π Describe the bug There appears to be a version conflict in the installation requirements: 1. The project requires `torch>=2.5.1` in its dependencies 2. However, the installation guide directs...
If you want to test the repo in just one click and with a gradio app, this makes it super simple. No need to install anythingβjust pull the image and...
### π Describe the bug Running the pipeline in a windows environment results in an error due to permissions. Access is denied to the temporary files created during processing. I...
I encountered the following error when processing multiple PDFs: `025-03-03 12:34:54,705 - __main__ - INFO - Attempt 149: Unexpected status code 403 INFO:httpx: HTTP Request: GET http://localhost:30000/v1/models "HTTP/1.1 403 Forbidden"`...
### π Describe the bug In my testing, I encountered the problem of missing multiple lines, whole lines, or regional text. How can I solve it? ### Versions In my...
Tks for your wonderful Project. Can you estimate how long will you release OlmOCR_Qwen2.5 ? I expect the quality of Qwen2.5 being fine-tuned so much
### π The feature, motivation and pitch hi there, I would like to ask one concern about how to install the olmocr dependencies without internet? Currently, I am using the...
fix torch version compatibility issue with flashinfer wheels the torch requirement in pyproject.toml (torch>=2.5.1) was incompatible with the flashinfer wheels which are built for torch2.4. this change updates the requirement...