benchmarks
benchmarks copied to clipboard
Benchmarking PDF libraries
https://ad-publications.cs.uni-freiburg.de/benchmark.pdf - just found that. Might be interesting
Test run on Python 3.8, Windows 7: - I took 4 arbitrary page numbers (pages 4,6,8,9). - For each of the benchmark listed pdf files I extracted those pages from...
see https://github.com/py-pdf/pypdf/issues/1789
I'd be interested in seeing pikepdf in the image extraction benchmark. It provides some pretty sophisticated code that can, in many cases, extract and save PDF images without needing to...
The current text extraction benchmark does not tell anything about how well newline characters are recognized. We need a new benchmark for that.
Add table extraction benchmark.
PdfAlto
I was wondering if you could add pdfalto in the benchmark: https://github.com/kermitt2/pdfalto
https://github.com/VikParuchuri/marker
Hello! Thank you for these benchmarks, I continue to find them useful! In particular since I have developed [PLAYA-PDF](https://github.com/dhdaines/playa), a PDF parsing and analysis library originally based on pdfminer.six but...
I believe adding a multithreading performance comparison to the benchmark would be beneficial. It could provide valuable insights into how different PDF libraries handle concurrent tasks, especially in a multithreaded...