[BUG: Breaking] marker process the pdf (<100 page), it stuck for at least 7 hour
Hi @lingyiliu016. Please provide some more context. What hardware are you on? What settings? Can you share the input doc?
Marker is highly performant and can process even a 100 page PDF in less than a minute on a H100 GPU in our testing.
@tarun-menta We have for e.g. this document , where I'm running marker with MPS on my MacBook Pro (Mac15,9) - M3 Max (40 GPU cores, 48 GB RAM, 16 CPU cores). I get e.g. Recognizing layout: 100%|███████████████████████████████████████████████████████████████████████████████| 44/44 [00:35<00:00, 1.26it/s] Running OCR Error Detection: 100%|██████████████████████████████████████████████████████████████████████| 66/66 [00:02<00:00, 28.93it/s] Detecting bboxes: 100%|███████████████████████████████████████████████████████████████████████████████████| 9/9 [00:03<00:00, 2.52it/s] Recognizing Text: 100%|███████████████████████████████████████████████████████████████████████████████| 229/229 [12:50<00:00, 3.36s/it] Recognizing tables: 100%|█████████████████████████████████████████████████████████████████████████████████| 4/4 [00:13<00:00, 3.44s/it] Detecting bboxes: 100%|███████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.44it/s] Recognizing Text: 100%|███████████████████████████████████████████████████████████████████████████████████| 3/3 [00:05<00:00, 1.69s/it] LLM processors running: 100%|███████████████████████████████████████████████████████████████████████████| 11/11 [00:17<00:00, 1.55s/it], hence it's running for 15 minutes to process this document. Although I did set "recognition_batch_size": 4, as otherwise the GPU memory got overloaded in practice (I get Error: command buffer exited with error status. The Metal Performance Shaders operations encoded on it may not have completed. Error: (null) Discarded (victim of GPU error/recovery) (00000005:kIOGPUCommandBufferCallbackErrorInnocentVictim) <AGXG15XFamilyCommandBuffer: 0x7f955d500> label = <none> device = <AGXG15CDevice: 0x10699de00> name = Apple M3 Max commandQueue = <AGXG15XFamilyCommandQueue: 0x1639a5000> label = <none> device = <AGXG15CDevice: 0x10699de00> name = Apple M3 Max retainedReferences = 1 and consequently oftenUserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d - which is a known issue), it's just much slower in general what I expect it to be - we often can only process 7 of such documents per hour when running a single process. What do you think we should do?
Note that I do pass in a processor_list: processor_list = [ "marker.processors.order.OrderProcessor", "marker.processors.line_merge.LineMergeProcessor", "marker.processors.blockquote.BlockquoteProcessor", "marker.processors.code.CodeProcessor", "marker.processors.document_toc.DocumentTOCProcessor", "marker.processors.equation.EquationProcessor", "marker.processors.footnote.FootnoteProcessor", "marker.processors.ignoretext.IgnoreTextProcessor", "marker.processors.line_numbers.LineNumbersProcessor", "marker.processors.list.ListProcessor", "marker.processors.page_header.PageHeaderProcessor", "marker.processors.sectionheader.SectionHeaderProcessor", "marker.processors.table.TableProcessor", "marker.processors.text.TextProcessor", "marker.processors.reference.ReferenceProcessor", "marker.processors.debug.DebugProcessor", "marker.processors.llm.llm_image_description.LLMImageDescriptionProcessor" ]
@tarun-menta @VikParuchuri Any idea? This is a blocking issue for us when using Marker
It looks like OCR is taking up most of the time here. The recognition model can unfortunately be slow on MPS for longer documents. Are you able to use GPU at all, or MPS is your only option?
MPS is my only option here. However, I can see from the system that MPS is using my GPU (40 cores) - Macbook Pro M3 Max, 48GB unified memory.
For context, a related issue I've opened and that has been closed for some reasons: https://github.com/datalab-to/marker/issues/875
Same thing here. Mine just freezes after almost 100%.
@VikParuchuri @tarun-menta Any advice is appreciated here. Note that me and my team are willing to contribute here if needed. For us, this is currently a bottleneck when using Marker: although the OCR provides is highly customizable, especially regarding image descriptions, it's our preference, but the speed when running it on own infra is just too slow to make it usable for our purposes, as we need to do OCR on thousands on text-heavy documents. Please let me know where I can help or what you think can be done in the short term to improve on this.
Similar to this issue and #875, I experienced a speedup from 27min. to 50s by simply reverting to marker version 1.8.0 (vs. latest 1.10.1). This is on Apple M4 32GB. Looking through the releases, I realise there are some different models used but this seems like too big a difference (maybe a slight improvement in output quality in my small test of two files).
Its possible that your torch version upgraded when you upgraded marker, and that fixed some MPS bugs. Can you check what your torch version is now>
Apologies for the inconvenience for everyone in this thread. Unfortunately the issue is a little deeper in pytorch-mps land, which makes it hard for us to address directly.
In my case, both runs were with torch 2.9.0
FYI, same issue for me on M4 with 128GB memory: https://github.com/datalab-to/marker/issues/875#issuecomment-3449895382
please can you provide code with llm