bottle icon indicating copy to clipboard operation
bottle copied to clipboard

Add a max_size parameter to save and _copy_file

Open 06180339 opened this issue 4 years ago • 9 comments

This is a proposal inspired by a requirement I had myself and by the following comment on Stack Overflow: “Bottle stores file uploads in request.files as FileUpload instances, along with some metadata about the upload. FileUpload has a very handy save method that perform the "file slurping" for you. It does not check the maximum file upload though, but IMHO it's better to have nginx in front to perform this kind of limitation check.”

If you don’t use nginx or so and want to have an upper limit for the file size, it would be nice if the save method had an additional parameter max_size to set a maximum size beyond which the writing is stopped.

The changes in this branch hopefully enable that and I wanna propose to merge them.

06180339 avatar Jan 19 '21 15:01 06180339

I like this idea.

It's super minor, but why use a keyword argument on a private method? It certainly doesn't hurt though and, like I said, this is minor.

Jwink3101 avatar Jan 19 '21 16:01 Jwink3101

Thank you. That is a good point – the function signature of a private method can be controlled within this library. OTOH, if max_size in _copy_file is made an arg instead of a kwarg, …

  • either it has to come before chunk_size
  • or chunk_size itself is also made an arg instead of a kwarg (which it is now).

The former has the downside that then the order in the signature differs from the order in the signature of save. And save itself should probably not be changed.

The latter might have the downside that then the default value of 2 ** 16 has to go – that may be insignificant, but maybe the creator of this method preferred to have a default value ingrained here also, not only in save itself.

So, I am obviously fine with anything that seems acceptable to the maintainers.

06180339 avatar Jan 19 '21 23:01 06180339

Further possible points of critique may be (in no prioritizing order):

  • Performance: The difference to the version so far would be the line if max_size is not None per every chunk. That is not nothing, but is probably rather negligeable – if there are a lot of chunks, the time should depend on IO anyway, I guess. A compromise is to have no comparatively costly function call in this line – otherwise if isinstance(max_size, int) comes to mind.
  • if isinstance(max_size, int) instead of if max_size is not None? See above. Or is it significantly more robust.
  • If max_size is exceeded, an IOError is thrown – that is also the error for the “File exists”-case. The errors are distinguished by their message each; but it might be better to have a stronger distinction. I am not sure, however, which other Error could or maybe should be used for exceeding the maximum size.

06180339 avatar Jan 19 '21 23:01 06180339

This is somewhat related but I couldn't find information on whether there is some hard limitation to file upload size? Can this method support upload for 2GB+ files? This appears to fail in my case on Raspberry PI. So is the current method using RAM? if not, why would it fail on raspberry PI?

knro avatar Apr 07 '22 09:04 knro

Probably because your TMPDIR is an in-memory file system (e.g. tmpfs) and thus limited to your available RAM.

defnull avatar Apr 07 '22 09:04 defnull

It's not, this is on Raspberry PI 16GB SD card and /tmp is just not size restricted as far as I know.

Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/root       14989948 6395892   7963776  45% /
devtmpfs         3787940       0   3787940   0% /dev
tmpfs            3952804    8356   3944448   1% /dev/shm
tmpfs            3952804   34904   3917900   1% /run
tmpfs               5120       4      5116   1% /run/lock
tmpfs            3952804       0   3952804   0% /sys/fs/cgroup
/dev/mmcblk0p1    258095   49336    208760  20% /boot
tmpfs             790560       8    790552   1% /run/user/1000

knro avatar Apr 07 '22 09:04 knro

oh I see, which of the tmpfs above is used by the Python/Bottle framework? is it the last entry which 790MB available size? this is why it fails? What's the workaround for such a case? Thanks!!

knro avatar Apr 07 '22 09:04 knro

Bottle uses cgi.FieldStorage which uses tempfile.TemporaryFile which in turn stores temp files in the directory returned by https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir

Bottle does not do any in-memory buffering of file uploads. The whole body is buffered, but only for requests smaller than BaseRequest.MEMFILE_MAX which is only 100kb by default. If your temp directory is not size-constrained and you are copying the uploaded file chunk by chunk and not in one step, it should not fail. Please open a new issue with a stack traces.

defnull avatar Apr 07 '22 10:04 defnull

Thanks. Under Linux, gettempdir is indeed /tmp. /tmp does not appear to be mounted to any tmpfs as per the df -h command.

knro avatar Apr 14 '22 04:04 knro