pdf2dataset
pdf2dataset copied to clipboard
Converts a whole subdirectory with a big (or small) volume of PDF documents to a dataset (pandas DataFrame) with error tracking and choice of features
Results
12
pdf2dataset issues
Sort by
recently updated
recently updated
newest added
Save the page counting and don't perform it again when resuming the processing
feature-request
When the IO is too slow, probably it's a good a idea to start the processing before the page counting ends: - [ ] When `chunksize` is provided, start processing...
enhancement
As we have many non python dependencies, having a ready to use `Dockerfile` would be very handy.
enhancement
good first issue
Currently, one feature from the document (equal value for all pages) will be extracted for each page
enhancement
- [ ] Document routines - [ ] Document module - [ ] Document classes
documentation