tornado icon indicating copy to clipboard operation
tornado copied to clipboard

Highlight stream_request_body in documentation

Open CLKBlu3 opened this issue 3 years ago • 4 comments

I am pretty sure this has been addressed somewhere else but I can't find a solution, so I am asking for help here.

I have a POST to upload files into my server as follows:

class uploadHandler(BaseHandler):
    def post(self):
        if(checkForwardedHeader(self.request.headers)):
            posts.upload(self.request.files['file'][0])
        else:
            raise tornado.web.HTTPError(403) # Unauthorized!

(Base handler is a custom RequestHandler that enables POSTS, GET and OPTIONS, nothing else, so no relevant here)

def upload(filenames):
    try:
      path=xxxxxxxxx #customapathusingthe filenames['filename'] value
      filebody=filenames['body']
      
      output_file = open(path, 'wb')
      output_file.write(filebody)

      # close file
      output_file.close()
    exception Exception as e:
        .............. # capturing possible errors

This works completely fine with small files (<100Mb), 0 problems

However, for larger files, it does not work, even though Nginx should set the max_body_size to 800M

Uploads return, for large files: [I 201110 12:37:48 http1connection:289] Malformed HTTP message from 127.0.0.1: Content-Length too long With Nginx capturing the following:

[error] 838512#838512: *3332387 readv() failed (104: Connection reset by peer) while reading upstream, client: 127.0.0.1, server: , request: "POST /upload HTTP/1.0", upstream: "http://127.0.0.1:8888/upload", host: "xxxxxx.com", referrer: "https://xxxxx.com"

So I decided to run the application with a custom max_buffer_size bigger than 100Mb to see if that was the issue since the docs (https://www.tornadoweb.org/en/stable/httpclient.html?highlight=upload) says the default size for the buffer is 100Mb

http_server = tornado.httpserver.HTTPServer(application, max_buffer_size=800000000) # 800MB buffer
http_server.listen(options.port)
tornado.ioloop.IOLoop.current().start()

Upon doing this, the error message changed to 502 Bad Gateway and immediatly crashing the webserver with a "Killed" message when uploading a file. Ngninx logs show, in the other hand the following captured error:

2020/11/10 12:34:41 [error] 838512#838512: *3309856 upstream prematurely closed connection while reading response header from upstream, client: 127.0.0.1, server: , request: "POST /upload HTTP/1.0", upstream: "http://127.0.0.1:8888/upload", host: "xxxxx.com", referrer: "https://xxxxx.com"

Everything works fine for uploads < 100MB, fails for even 101MB :)

How should I actually approach big files uploads with Tornado?

CLKBlu3 avatar Nov 10 '20 13:11 CLKBlu3

use @stream_request_body: https://www.tornadoweb.org/en/stable/web.html#tornado.web.stream_request_body

ploxiln avatar Nov 10 '20 16:11 ploxiln

I did manage to fix it, thanks for the hint. Anyway I think the docs/demo should explain/exemplify the "large files" scenario as I see its a recurring issue.

What I did is the following, if you want to add it as an example, its based on a stack overflow example, so you should be free to do so:

@tornado.web.stream_request_body
class uploadHandler(BaseHandler):
    def initialize(self):
        self.meta = dict()
        self.receiver = self.get_receiver()

    def data_received(self, chunk):
        self.receiver(chunk)

    def get_receiver(self):
        index = 0
        SEPARATE = b'\r\n'
        _UPLOADS_ = '/uploads' # uploads folder

        def receiver(chunk):
            nonlocal index
            if index == 0:
                print(chunk)
                index +=1
                split_chunk = chunk.split(SEPARATE)
                self.meta['boundary'] = SEPARATE + split_chunk[0] + b'--' + SEPARATE
                self.meta['header'] = SEPARATE.join(split_chunk[0:3])
                self.meta['header'] += SEPARATE *2
                self.meta['filename'] = split_chunk[1].split(b'=')[-1].replace(b'"', b'').decode()
                chunk = chunk[len(self.meta['header']):] #actual stream
                path = _UPLOADS_ + '/' + self.meta['filename'].lower() 
                
                self.fp = open(path, 'wb')
                self.fp.write(chunk)
            else:
                self.fp.write(chunk)
        return receiver

    def post(self, *args, **kwargs):
            self.meta['content_length'] = int(self.request.headers.get('Content-Length')) - len(self.meta['header']) - len(self.meta['boundary'])
            self.fp.seek(self.meta['content_length'], 0)
            self.fp.truncate()
            self.fp.close()
            self.finish('OK')

CLKBlu3 avatar Nov 11 '20 13:11 CLKBlu3

There is already a file upload demo here. It is also mentioned in passing in the documentation, though I think it could be made more prominent (somewhere in the User's Guide section, for example).

mivade avatar Nov 11 '20 13:11 mivade

Yeah, this isn't documented very well unless you read through the full tornado.web module docs to see what's there. It does deserve to be called out somewhere more prominent, such as the user's guide. The max_buffer_size parameter also needs to be documented, and that documentation should point out that you almost never want to increase it - you may want to decrease it to control memory usage more tightly, but if you need more you really need stream_request_body instead.

bdarnell avatar Nov 19 '20 02:11 bdarnell