pdf2dataset icon indicating copy to clipboard operation
pdf2dataset copied to clipboard

Converts a whole subdirectory with a big (or small) volume of PDF documents to a dataset (pandas DataFrame) with error tracking and choice of features

Results 12 pdf2dataset issues
Sort by recently updated
recently updated
newest added

Save the page counting and don't perform it again when resuming the processing

feature-request

enhancement
good first issue

enhancement
good first issue

When the IO is too slow, probably it's a good a idea to start the processing before the page counting ends: - [ ] When `chunksize` is provided, start processing...

enhancement

As we have many non python dependencies, having a ready to use `Dockerfile` would be very handy.

enhancement
good first issue

Currently, one feature from the document (equal value for all pages) will be extracted for each page

enhancement

- [ ] Document routines - [ ] Document module - [ ] Document classes

documentation