olmocr icon indicating copy to clipboard operation
olmocr copied to clipboard

Temporary file permissions access denied in windows

Open dajwahl opened this issue 3 days ago • 0 comments

🐛 Describe the bug

Running the pipeline in a windows environment results in an error due to permissions. Access is denied to the temporary files created during processing. I believe the delete=False flag has to be set for NamedTemporaryFile to allow use in a windows environment.

python -m olmocr.pipeline ./localworkspace --pdfs tests/gnarly_pdfs/horribleocr.pdf

2025-03-02 08:35:25,838 - main - INFO - Got --pdfs argument, going to add to the work queue 2025-03-02 08:35:25,838 - main - INFO - Loading file at tests/gnarly_pdfs/horribleocr.pdf as PDF document 2025-03-02 08:35:25,838 - main - INFO - Found 1 total pdf paths to add Sampling PDFs to calculate optimal length: 0%| | 0/1 [00:00<?, ?it/s]2025-03-02 08:35:25,844 - main - WARNING - Failed to read tests/gnarly_pdfs/horribleocr.pdf: [Errno 13] Permission denied: 'C:\Users\Chex1\AppData\Local\Temp\tmpipzbokdz.pdf'

Versions

Python 3.11.11 annotated-types==0.7.0 anyio==4.8.0 asttokens==3.0.0 beaker-py==1.34.1 bleach==6.2.0 boto3==1.37.4 botocore==1.37.4 cached_path==1.6.7 cachetools==5.5.2 certifi==2025.1.31 cffi==1.17.1 charset-normalizer==3.4.1 click==8.1.8 colorama==0.4.6 cryptography==44.0.1 decorator==5.2.1 docker==7.1.0 executing==2.2.0 filelock==3.17.0 fsspec==2025.2.0 ftfy==6.3.1 google-api-core==2.24.1 google-auth==2.38.0 google-cloud-core==2.4.2 google-cloud-storage==2.19.0 google-crc32c==1.6.0 google-resumable-media==2.7.2 googleapis-common-protos==1.68.0 h11==0.14.0 httpcore==1.0.7 httpx==0.28.1 huggingface-hub==0.27.1 idna==3.10 ipython==9.0.0 ipython_pygments_lexers==1.1.1 jedi==0.19.2 Jinja2==3.1.5 jmespath==1.0.1 lingua-language-detector==2.0.2 markdown-it-py==3.0.0 markdown2==2.5.3 MarkupSafe==3.0.2 matplotlib-inline==0.1.7 mdurl==0.1.2 mpmath==1.3.0 networkx==3.4.2 numpy==2.2.3

Editable Git install with no remote (olmocr==0.1.58)

-e c:\users\chex1\olmocr orjson==3.10.15 packaging==24.2 parso==0.8.4 pillow==11.1.0 poppler-utils==0.1.0 prompt_toolkit==3.0.50 proto-plus==1.26.0 protobuf==5.29.3 pure_eval==0.2.3 pyasn1==0.6.1 pyasn1_modules==0.4.1 pycparser==2.22 pydantic==2.10.6 pydantic_core==2.27.2 Pygments==2.19.1 pypdf==5.3.0 pypdfium2==4.30.1 python-dateutil==2.9.0.post0 pywin32==308 PyYAML==6.0.2 regex==2024.11.6 requests==2.32.3 rich==13.9.4 rsa==4.9 s3transfer==0.11.3 safetensors==0.5.3 setproctitle==1.3.5 sgl-kernel==0.0.1 sglang==0.4.3.post2 six==1.17.0 smart-open==7.1.0 sniffio==1.3.1 stack-data==0.6.3 sympy==1.13.1 tokenizers==0.21.0 torch==2.6.0+cu126 torchaudio==2.6.0+cu126 torchvision==0.21.0+cu126 tqdm==4.67.1 traitlets==5.14.3 transformers==4.49.0 typing_extensions==4.12.2 urllib3==2.3.0 wcwidth==0.2.13 webencodings==0.5.1 wrapt==1.17.2 zstandard==0.23.0

dajwahl avatar Mar 02 '25 16:03 dajwahl