benchmarks issues

Review literature

https://ad-publications.cs.uni-freiburg.de/benchmark.pdf - just found that. Might be interesting

pdfrw vs pypdf page extraction & merge

7

Test run on Python 3.8, Windows 7: - I took 4 arbitrary page numbers (pages 4,6,8,9). - For each of the benchmark listed pdf files I extracted those pages from...

abubelinha

enhancement

Add "math extraction" benchmark

see https://github.com/py-pdf/pypdf/issues/1789

MartinThoma

Add pikepdf image extractor

I'd be interested in seeing pikepdf in the image extraction benchmark. It provides some pretty sophisticated code that can, in many cases, extract and save PDF images without needing to...

mara004

Add text extraction benchmark for paragraph recognition

The current text extraction benchmark does not tell anything about how well newline characters are recognized. We need a new benchmark for that.

MartinThoma

Table extraction

1

Add table extraction benchmark.

Yagniksojitra

PdfAlto

1

I was wondering if you could add pdfalto in the benchmark: https://github.com/kermitt2/pdfalto

lfoppiano

Add marker benchmarks

https://github.com/VikParuchuri/marker

JonZeolla

Add PLAYA-PDF and update release dates

Hello! Thank you for these benchmarks, I continue to find them useful! In particular since I have developed [PLAYA-PDF](https://github.com/dhdaines/playa), a PDF parsing and analysis library originally based on pdfminer.six but...

dhdaines

Suggestion: Multithreading Performance Comparison

1

I believe adding a multithreading performance comparison to the benchmark would be beneficial. It could provide valuable insights into how different PDF libraries handle concurrent tasks, especially in a multithreaded...

cyy-2024

benchmarks
benchmarks copied to clipboard

Metadata

Review literature

pdfrw vs pypdf page extraction & merge

Add "math extraction" benchmark

Add pikepdf image extractor

Add text extraction benchmark for paragraph recognition

Table extraction

PdfAlto

Add marker benchmarks

Add PLAYA-PDF and update release dates

Suggestion: Multithreading Performance Comparison

← Metadata

Owner

Metadata

benchmarks benchmarks copied to clipboard

Metadata

← Metadata

Owner

Metadata

benchmarks
benchmarks copied to clipboard