textract
textract copied to clipboard
Extract text directly from file-object / file-content rather than using filename
Maybe this is already possible?
How would i go about to extract text from the content of a file, rather than reading the file itself? Background is that using the upload component in Dash, one gets the content of the file rather than a pointer to the file location.
Perhaps, by accessing one of the internal functions in textract and specifying an extension this is already possible?
Unfortunately there's no way to do that, as textract launches external commands (most notably pdftotext) to process files sometimes.