marker [BUG: Breaking] marker process the pdf (<100 page), it stuck for at least 7 hour

Sep 17 '25 14:09 lingyiliu016

Hi @lingyiliu016. Please provide some more context. What hardware are you on? What settings? Can you share the input doc?

Marker is highly performant and can process even a 100 page PDF in less than a minute on a H100 GPU in our testing.

Sep 19 '25 16:09 tarun-menta

@tarun-menta We have for e.g. this document , where I'm running marker with MPS on my MacBook Pro (Mac15,9) - M3 Max (40 GPU cores, 48 GB RAM, 16 CPU cores). I get e.g. Recognizing layout: 100%|███████████████████████████████████████████████████████████████████████████████| 44/44 [00:35<00:00, 1.26it/s] Running OCR Error Detection: 100%|██████████████████████████████████████████████████████████████████████| 66/66 [00:02<00:00, 28.93it/s] Detecting bboxes: 100%|███████████████████████████████████████████████████████████████████████████████████| 9/9 [00:03<00:00, 2.52it/s] Recognizing Text: 100%|███████████████████████████████████████████████████████████████████████████████| 229/229 [12:50<00:00, 3.36s/it] Recognizing tables: 100%|█████████████████████████████████████████████████████████████████████████████████| 4/4 [00:13<00:00, 3.44s/it] Detecting bboxes: 100%|███████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.44it/s] Recognizing Text: 100%|███████████████████████████████████████████████████████████████████████████████████| 3/3 [00:05<00:00, 1.69s/it] LLM processors running: 100%|███████████████████████████████████████████████████████████████████████████| 11/11 [00:17<00:00, 1.55s/it], hence it's running for 15 minutes to process this document. Although I did set "recognition_batch_size": 4, as otherwise the GPU memory got overloaded in practice (I get Error: command buffer exited with error status. The Metal Performance Shaders operations encoded on it may not have completed. Error: (null) Discarded (victim of GPU error/recovery) (00000005:kIOGPUCommandBufferCallbackErrorInnocentVictim) <AGXG15XFamilyCommandBuffer: 0x7f955d500> label = <none> device = <AGXG15CDevice: 0x10699de00> name = Apple M3 Max commandQueue = <AGXG15XFamilyCommandQueue: 0x1639a5000> label = <none> device = <AGXG15CDevice: 0x10699de00> name = Apple M3 Max retainedReferences = 1 and consequently oftenUserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d - which is a known issue), it's just much slower in general what I expect it to be - we often can only process 7 of such documents per hour when running a single process. What do you think we should do?

Note that I do pass in a processor_list: processor_list = [ "marker.processors.order.OrderProcessor", "marker.processors.line_merge.LineMergeProcessor", "marker.processors.blockquote.BlockquoteProcessor", "marker.processors.code.CodeProcessor", "marker.processors.document_toc.DocumentTOCProcessor", "marker.processors.equation.EquationProcessor", "marker.processors.footnote.FootnoteProcessor", "marker.processors.ignoretext.IgnoreTextProcessor", "marker.processors.line_numbers.LineNumbersProcessor", "marker.processors.list.ListProcessor", "marker.processors.page_header.PageHeaderProcessor", "marker.processors.sectionheader.SectionHeaderProcessor", "marker.processors.table.TableProcessor", "marker.processors.text.TextProcessor", "marker.processors.reference.ReferenceProcessor", "marker.processors.debug.DebugProcessor", "marker.processors.llm.llm_image_description.LLMImageDescriptionProcessor" ]

Sep 26 '25 07:09 MauritsBrinkman

@tarun-menta @VikParuchuri Any idea? This is a blocking issue for us when using Marker

Sep 29 '25 13:09 MauritsBrinkman

It looks like OCR is taking up most of the time here. The recognition model can unfortunately be slow on MPS for longer documents. Are you able to use GPU at all, or MPS is your only option?

Sep 29 '25 13:09 VikParuchuri

MPS is my only option here. However, I can see from the system that MPS is using my GPU (40 cores) - Macbook Pro M3 Max, 48GB unified memory.

Sep 29 '25 15:09 MauritsBrinkman

For context, a related issue I've opened and that has been closed for some reasons: https://github.com/datalab-to/marker/issues/875

Sep 30 '25 16:09 EHadoux

Same thing here. Mine just freezes after almost 100%.

Oct 02 '25 01:10 Puyandeh

@VikParuchuri @tarun-menta Any advice is appreciated here. Note that me and my team are willing to contribute here if needed. For us, this is currently a bottleneck when using Marker: although the OCR provides is highly customizable, especially regarding image descriptions, it's our preference, but the speed when running it on own infra is just too slow to make it usable for our purposes, as we need to do OCR on thousands on text-heavy documents. Please let me know where I can help or what you think can be done in the short term to improve on this.

Oct 02 '25 17:10 MauritsBrinkman

Similar to this issue and #875, I experienced a speedup from 27min. to 50s by simply reverting to marker version 1.8.0 (vs. latest 1.10.1). This is on Apple M4 32GB. Looking through the releases, I realise there are some different models used but this seems like too big a difference (maybe a slight improvement in output quality in my small test of two files).

Oct 24 '25 06:10 mattijsdp

Its possible that your torch version upgraded when you upgraded marker, and that fixed some MPS bugs. Can you check what your torch version is now>

Apologies for the inconvenience for everyone in this thread. Unfortunately the issue is a little deeper in pytorch-mps land, which makes it hard for us to address directly.

Oct 24 '25 15:10 tarun-menta

In my case, both runs were with torch 2.9.0

Oct 26 '25 19:10 mattijsdp

FYI, same issue for me on M4 with 128GB memory: https://github.com/datalab-to/marker/issues/875#issuecomment-3449895382

Oct 27 '25 07:10 kidwellj

please can you provide code with llm

Nov 08 '25 12:11 ankit8347