Seg-fault running marker_single, likely caused by unstable Torch version
Env
- macOS 15, M-series Mac
- Python 3.13.3 (Homebrew)
- marker-pdf 1.7.2 (
pipx install marker-pdf[full]) - torch 2.7.0
Repro
> GOOGLE_API_KEY=... marker_single 'filename.pdf' --output_format markdown --use_llm --format_lines
Recognizing layout: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:09<00:00, 1.18s/it]
LLM layout relabelling: 3it [00:01, 2.14it/s]
Running OCR Error Detection: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 17.09it/s]
Detecting bboxes: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:05<00:00, 2.35it/s]
Recognizing Text: 3%|████▉ | 53/2032 [00:10<05:06, 6.45it/s]
zsh: segmentation fault GOOGLE_API_KEY=... marker_single
/opt/homebrew/Cellar/[email protected]/3.13.3_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/multiprocessing/resource_tracker.py:301: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown: {'/loky-16920-_odxfu43'}
warnings.warn(
Work-arounds
-
TORCH_DEVICE=cpu marker_single …→ works (no GPU). - Reinstall on Python 3.12:
pipx reinstall marker-pdf --python python3.12→ works with MPS.
Likely Cause Torch 2.3 MPS backend is still unstable on Python 3.13; Marker’s multi-process OCR hits a semaphore bug.
Suggestion
- Warn or auto-set
TORCH_DEVICE=cpuwhen running on py 3.13 + macOS, or - Pin
requires-pythonto < 3.13 until Torch fixes MPS.
I am still getting segfaults on apple silicon using python 3.11.11, with device=cpu, only happens with --force_ocr flag set
I get the same error - unless I omit "--format_lines". Without that, it works. Tried with device=cpu and device=mps
@dirk-fr @SidSethi @malibrated
Can you try with a newer version of marker? We fixed a bug in surya (which is packaged with marker) to bring down the VRAM in non flash-attention environments, which is probably the source of the error for y'all.
still breaks on certain PDFs for me, no idea why, see tests below, m3 macbook pro, works on CPU
(base) xxxxx@Xxxxxs-MacBook-Pro ~ % marker_single "/Users/xxxxx/Documents/pdf/06_Annual Reports/1996 Annual Report.pdf"
--processors marker.processors.large_page_to_image.LargePageToImageProcessor \
--output_format markdown --paginate_output --strip_existing_ocr --output_dir "/Users/xxxxx/Documents/pdf/06_Annual Reports"
Recognizing layout: 100%|███████████████████████████| 1/1 [00:01<00:00, 1.44s/it]
Running OCR Error Detection: 100%|██████████████████| 2/2 [00:00<00:00, 18.68it/s]
Detecting bboxes: 100%|█████████████████████████████| 2/2 [00:00<00:00, 2.73it/s]
Recognizing Text: 0%| | 0/897 [00:00<?, ?it/s]zsh: segmentation fault marker_single --processors --output_format markdown --paginate_output
(base) xxxxx@Xxxxxs-MacBook-Pro ~ % /opt/anaconda3/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
(base) xxxxx@Xxxxxs-MacBook-Pro ~ % marker_single "/Users/xxxxx/Documents/pdf/06_Annual Reports/1996 Annual Report.pdf" --output_format markdown --paginate_output --strip_existing_ocr --output_dir "/Users/xxxxx/Documents/pdf/06_Annual Reports" Recognizing layout: 100%|███████████████████████████| 1/1 [00:01<00:00, 1.45s/it] Running OCR Error Detection: 100%|██████████████████| 2/2 [00:00<00:00, 18.60it/s] Detecting bboxes: 100%|█████████████████████████████| 2/2 [00:00<00:00, 2.72it/s] Recognizing Text: 0%| | 0/897 [00:00<?, ?it/s]zsh: segmentation fault marker_single --output_format markdown --paginate_output --strip_existing_oc (base) xxxxx@Xxxxxs-MacBook-Pro ~ % /opt/anaconda3/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '
(base) xxxxx@Xxxxxs-MacBook-Pro ~ %
TORCH_DEVICE=cpu OMP_NUM_THREADS=1 MKL_NUM_THREADS=1
marker_single "/Users/xxxxx/Documents/pdf/06_Annual Reports/1996 Annual Report.pdf"
--output_format markdown
--paginate_output
--strip_existing_ocr
--output_dir "/Users/xxxxx/Documents/pdf/06_Annual Reports"
Recognizing layout: 100%|███████████████████████████| 1/1 [00:05<00:00, 5.83s/it]
Running OCR Error Detection: 100%|██████████████████| 2/2 [00:00<00:00, 72.70it/s]
Detecting bboxes: 100%|█████████████████████████████| 2/2 [00:08<00:00, 4.05s/it]
Recognizing Text: 100%|█████████████████████████| 896/896 [04:11<00:00, 3.56it/s]
Detecting bboxes: 100%|█████████████████████████████| 1/1 [00:03<00:00, 3.02s/it]
Recognizing Text: 100%|███████████████████████████| 96/96 [00:18<00:00, 5.14it/s]
Recognizing tables: 100%|███████████████████████████| 1/1 [00:02<00:00, 2.68s/it]
2025-06-26 11:17:49,099 [INFO] marker: Saved markdown to /Users/xxxxx/Documents/pdf/06_Annual Reports/1996 Annual Report
2025-06-26 11:17:49,099 [INFO] marker: Total time: 292.76327896118164
(base) xxxxx@Xxxxxs-MacBook-Pro ~ % python -m pip show marker-pdf
Name: marker-pdf
Version: 1.7.5
Summary: Convert documents to markdown with high speed and accuracy.
Home-page:
Author: Vik Paruchuri
Author-email: [email protected]
License: GPL-3.0-or-later
Location: /opt/anaconda3/lib/python3.12/site-packages
Requires: anthropic, click, filetype, ftfy, google-genai, markdown2, markdownify, openai, pdftext, Pillow, pre-commit, pydantic, pydantic-settings, python-dotenv, rapidfuzz, regex, scikit-learn, surya-ocr, torch, tqdm, transformers
Required-by:
still breaks on certain PDFs for me, no idea why, see tests below, m3 macbook pro, works on CPU
(base) xxxxx@Xxxxxs-MacBook-Pro ~ % marker_single "/Users/xxxxx/Documents/pdf/06_Annual Reports/1996 Annual Report.pdf" --processors marker.processors.large_page_to_image.LargePageToImageProcessor \ --output_format markdown --paginate_output --strip_existing_ocr --output_dir "/Users/xxxxx/Documents/pdf/06_Annual Reports" Recognizing layout: 100%|███████████████████████████| 1/1 [00:01<00:00, 1.44s/it] Running OCR Error Detection: 100%|██████████████████| 2/2 [00:00<00:00, 18.68it/s] Detecting bboxes: 100%|█████████████████████████████| 2/2 [00:00<00:00, 2.73it/s] Recognizing Text: 0%| | 0/897 [00:00<?, ?it/s]zsh: segmentation fault marker_single --processors --output_format markdown --paginate_output (base) xxxxx@Xxxxxs-MacBook-Pro ~ % /opt/anaconda3/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '
(base) xxxxx@Xxxxxs-MacBook-Pro ~ % marker_single "/Users/xxxxx/Documents/pdf/06_Annual Reports/1996 Annual Report.pdf" --output_format markdown --paginate_output --strip_existing_ocr --output_dir "/Users/xxxxx/Documents/pdf/06_Annual Reports" Recognizing layout: 100%|███████████████████████████| 1/1 [00:01<00:00, 1.45s/it] Running OCR Error Detection: 100%|██████████████████| 2/2 [00:00<00:00, 18.60it/s] Detecting bboxes: 100%|█████████████████████████████| 2/2 [00:00<00:00, 2.72it/s] Recognizing Text: 0%| | 0/897 [00:00<?, ?it/s]zsh: segmentation fault marker_single --output_format markdown --paginate_output --strip_existing_oc (base) xxxxx@Xxxxxs-MacBook-Pro ~ % /opt/anaconda3/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '
(base) xxxxx@Xxxxxs-MacBook-Pro ~ % TORCH_DEVICE=cpu OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 marker_single "/Users/xxxxx/Documents/pdf/06_Annual Reports/1996 Annual Report.pdf" --output_format markdown --paginate_output --strip_existing_ocr --output_dir "/Users/xxxxx/Documents/pdf/06_Annual Reports" Recognizing layout: 100%|███████████████████████████| 1/1 [00:05<00:00, 5.83s/it] Running OCR Error Detection: 100%|██████████████████| 2/2 [00:00<00:00, 72.70it/s] Detecting bboxes: 100%|█████████████████████████████| 2/2 [00:08<00:00, 4.05s/it] Recognizing Text: 100%|█████████████████████████| 896/896 [04:11<00:00, 3.56it/s] Detecting bboxes: 100%|█████████████████████████████| 1/1 [00:03<00:00, 3.02s/it] Recognizing Text: 100%|███████████████████████████| 96/96 [00:18<00:00, 5.14it/s] Recognizing tables: 100%|███████████████████████████| 1/1 [00:02<00:00, 2.68s/it] 2025-06-26 11:17:49,099 [INFO] marker: Saved markdown to /Users/xxxxx/Documents/pdf/06_Annual Reports/1996 Annual Report 2025-06-26 11:17:49,099 [INFO] marker: Total time: 292.76327896118164
(base) xxxxx@Xxxxxs-MacBook-Pro ~ % python -m pip show marker-pdf Name: marker-pdf Version: 1.7.5 Summary: Convert documents to markdown with high speed and accuracy. Home-page: Author: Vik Paruchuri Author-email: [email protected] License: GPL-3.0-or-later Location: /opt/anaconda3/lib/python3.12/site-packages Requires: anthropic, click, filetype, ftfy, google-genai, markdown2, markdownify, openai, pdftext, Pillow, pre-commit, pydantic, pydantic-settings, python-dotenv, rapidfuzz, regex, scikit-learn, surya-ocr, torch, tqdm, transformers Required-by:
same +1. works only for certain PDFs.
I got the same errors with python 3.13 & 3.12 while process a 800 pages pdf, with or without --recognition_batch_size. My macbook is M1 64GB. Reduce the batch_size only make it crash earlier.
However when I reduce the pages with --page_range "0-300", it no longer crash with segmentation fault. But then I have to use export TORCH_DEVICE=cpu to fix the RuntimeError: stack expects a non-empty TensorList error.
tl;dr: use export TORCH_DEVICE=cpu and --page_range to reduce the pages works for me.
got the same error with python 3.13 & 3.12 while processing a 100 pages pdf. export TORCH_DEVICE=cpu solved the problem
Same error with most files