Could you evaluate with MinerU, GOT-OCR, olmoOCR, MarkItDown?
I've collected some notable pipelines at https://github.com/dantetemplar/pdf-extraction-agenda
Thanks for sharing - we'll try to integrate more tools as we have bandwidth - olmocr is integrated into our benchmarks, will add results to our README shortly
hello, I have test olmOCR. But I do fell it perform not good and has more mistake than minerU. Do you have better tools that can make a good perform in pdf-OCR?
Thanks for sharing - we'll try to integrate more tools as we have bandwidth - olmocr is integrated into our benchmarks, will add results to our README shortly
Looking forward to your benchmark results against OlmOCR. Theirs is awesome, but very slow even on my 4090 GPU. Thanks.