toolbelt icon indicating copy to clipboard operation
toolbelt copied to clipboard

MultipartEncoder assumes total_len(file) is cheap

Open keturn opened this issue 6 years ago • 2 comments

We were uploading files from Django to Box using https://github.com/box/box-python-sdk, which uses requests-toolbelt MultipartEncoder, and things were really slow, orders-of-magnitude slower than they should be, despite assurances that there was no throttling in place between the AWS servers and Box's upload server.

A profile explains why: Thread Profile for Box Upload.pdf

We're spending lots time making size requests on the file being uploaded. Looks like multiple for every pass through the network loop.

This can be real slow if the "file" represents a network resource, and especially with Django's current implementation of the method. See related bug https://code.djangoproject.com/ticket/9631#comment:8

keturn avatar Apr 03 '18 21:04 keturn

Does Part.bytes_left_to_write need to recompute total_len(self.body) every time? That is, does it assume that Part.body may be changing while it's in process, or can we do this only once in __init__ where part.len is set?

This is related to #171 , but I filed this under a separate issue, because it's not manifesting as "infinite loop" here.

In #171 there's a comment:

Is there anyway to determine the remaining amount? [...] As it stands, Django is breaking the way file-like objects work in Python.

but file objects have no support for __len__ at all, and several of the implementations in multipart.encoder.total_len do return the size of the entire file (as is what I would expect of a method called total_len), not something measured relative to the current seek position.

keturn avatar Apr 03 '18 22:04 keturn

Does Part.bytes_left_to_write need to recompute total_len(self.body) every time?

Frankly, I don't remember. It's been years since I touched that part of this code. Do you have time to experiment/git history splunk to determine if it is necessary?

but file objects have no support for len at all,

It seems like you're making assumptions of what total_len does. It's a simple way of grabbing the length of a string/bytes/file/IO object based on what we're doing. I also haven't reviewed #171.

sigmavirus24 avatar Apr 04 '18 15:04 sigmavirus24