textract icon indicating copy to clipboard operation
textract copied to clipboard

Textract is unable to find file when used inside a flask

Open dhinar1991 opened this issue 6 years ago • 3 comments

I am trying to read contents of files like .txt,.docx,.pdf and so on with textract. when i use the below code, it throws error:


   @app.route('/upload', methods=['POST'])
    def upload():
      file = request.files['file']
      dt = file.read()
      result = textract.process(dt)
      return (result)

when i uploaded a docx file,

File "/usr/lib/python2.7/genericpath.py", line 26, in exists os.stat(path) TypeError: stat() argument 1 must be encoded string without null bytes, not str

It seems that textract is unable find the input file "dt". you can see that I am using it in flask. I tried the solution pip install chardet==2.1.1 from https://github.com/deanmalmgren/textract/issues/107 and also checked here https://github.com/deanmalmgren/textract/issues/133

Any help please?

dhinar1991 avatar Apr 12 '18 10:04 dhinar1991

Same problem, for me it works when I run Flask with Gunicorn, but it fails with apparently no reason when I use Nginx+Gunicorn+Flask. And the crash is in the line when textract "opens" the file: textract.process(dt)

hhsm95 avatar Aug 14 '18 21:08 hhsm95

BTW. I used the module "werkzeug" to save the uploaded file to the server, maybe this is your problem, and maybe this will work for you:

from werkzeug import secure_filename

@app.route('/upload', methods=['POST'])
def upload():
    file = request.files['file']
    filename = secure_filename(file.filename)
    result = textract.process(filename)
    return (result)

hhsm95 avatar Aug 14 '18 21:08 hhsm95

The path ur giving is not absolute path names , try inserting absolute path names before inserting into textract.process(path)

mohammedyunus009 avatar Feb 28 '19 09:02 mohammedyunus009