benchmarks icon indicating copy to clipboard operation
benchmarks copied to clipboard

Benchmarking PDF libraries

Results 10 benchmarks issues
Sort by recently updated
recently updated
newest added

https://ad-publications.cs.uni-freiburg.de/benchmark.pdf - just found that. Might be interesting

Test run on Python 3.8, Windows 7: - I took 4 arbitrary page numbers (pages 4,6,8,9). - For each of the benchmark listed pdf files I extracted those pages from...

enhancement

see https://github.com/py-pdf/pypdf/issues/1789

I'd be interested in seeing pikepdf in the image extraction benchmark. It provides some pretty sophisticated code that can, in many cases, extract and save PDF images without needing to...

The current text extraction benchmark does not tell anything about how well newline characters are recognized. We need a new benchmark for that.

Add table extraction benchmark.

I was wondering if you could add pdfalto in the benchmark: https://github.com/kermitt2/pdfalto

https://github.com/VikParuchuri/marker

Hello! Thank you for these benchmarks, I continue to find them useful! In particular since I have developed [PLAYA-PDF](https://github.com/dhdaines/playa), a PDF parsing and analysis library originally based on pdfminer.six but...

I believe adding a multithreading performance comparison to the benchmark would be beneficial. It could provide valuable insights into how different PDF libraries handle concurrent tasks, especially in a multithreaded...