textract
textract copied to clipboard
Textract is unable to find file when used inside a flask
I am trying to read contents of files like .txt,.docx,.pdf and so on with textract. when i use the below code, it throws error:
@app.route('/upload', methods=['POST'])
def upload():
file = request.files['file']
dt = file.read()
result = textract.process(dt)
return (result)
when i uploaded a docx file,
File "/usr/lib/python2.7/genericpath.py", line 26, in exists os.stat(path) TypeError: stat() argument 1 must be encoded string without null bytes, not str
It seems that textract is unable find the input file "dt". you can see that I am using it in flask.
I tried the solution pip install chardet==2.1.1
from https://github.com/deanmalmgren/textract/issues/107 and also checked here https://github.com/deanmalmgren/textract/issues/133
Any help please?
Same problem, for me it works when I run Flask with Gunicorn, but it fails with apparently no reason when I use Nginx+Gunicorn+Flask. And the crash is in the line when textract "opens" the file: textract.process(dt)
BTW. I used the module "werkzeug" to save the uploaded file to the server, maybe this is your problem, and maybe this will work for you:
from werkzeug import secure_filename
@app.route('/upload', methods=['POST'])
def upload():
file = request.files['file']
filename = secure_filename(file.filename)
result = textract.process(filename)
return (result)
The path ur giving is not absolute path names , try inserting absolute path names before inserting into textract.process(path)