textract icon indicating copy to clipboard operation
textract copied to clipboard

Suggestion: Add support for .pdf files

Open Hala-Hamdoun opened this issue 1 year ago • 2 comments

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Which filetype should textract support? A clear and concise description of file types you think textract should be able to process.

Which external software (python or command line tool), can parse the requested file type A clear and concise description of tools that can parse the desired filetype.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

Hala-Hamdoun avatar Mar 31 '24 22:03 Hala-Hamdoun

PDF support is already included

StevenMapes avatar Dec 27 '24 12:12 StevenMapes

If you're receiving an error like the below, check you're running textract with Python <3.12.

The filename extension .pdf is not yet supported by
textract. Please suggest this filename extension here:

    https://github.com/deanmalmgren/textract/issues

Available extensions include: .csv, .doc, .docx, .eml, .epub, .gif, .htm, .html, .jpeg, .jpg, .json, .log, .mp3, .msg, .odt, .ogg, .pdf, .png, .pptx, .ps, .psv, .rtf, .tab, .tff, .tif, .tiff, .tsv, .txt, .wav, .xls, .xlsx

damonmcminn avatar Jan 24 '25 10:01 damonmcminn