amazon-textract-textractor
amazon-textract-textractor copied to clipboard
Issue with multipage PDFs on s3 without extension
Hello, first of all thanks for the awesome package.
I am currently having an issue trying to run textractor on my PDFs that are stored in s3.
The issue stems from the fact that all my files (for security, and other reasons, which I think are pretty common practice at larger enterprises) are stored as UUIDs instead of their actual filename so when call_textract is called, it goes through the entire process without actually hitting any of the if statements and just returns an empty dict.
Is there any way that maybe this use case could be supported?
Makes sense. I'll add a flag to force a specific mime type. @lvieirajr
Published as part of 0.2.2. for the caller. Assigning this to @Belval to add this ability to the Textractor as well.