starlette icon indicating copy to clipboard operation
starlette copied to clipboard

Feature: Add size attribute to UploadFile

Open rafalp opened this issue 3 years ago • 10 comments

This PR adds size attribute to UploadFile, enabling for easy retrieval of uploaded file's size in bytes without need for seeking file's end and telling its size. This is achieved by setting initial size = 0 attribute on file and increasing it within write via self.size += len(data).

UploadFile also supports file argument for bytes stream. This seems like its used in tests only so I've resolved to writing naive size resolver that seeks stream's end and telling its size. This implementation is unlikely to work in production though, so maybe we should have size argument on constructor or make async def size(self) -> int as getter instead?

rafalp avatar Jan 11 '22 20:01 rafalp

UploadFile also supports file argument for bytes stream. This seems like its used in tests only so I've resolved to writing naive size resolver that seeks stream's end and telling its size. This implementation is unlikely to work in production though, so maybe we should have size argument on constructor of make async def size(self) -> int as getter instead?

That's what I was thinking before reading the description (I've jumped into the code before reading the description). 🤔

Kludex avatar Jan 11 '22 21:01 Kludex

Would there be any cons to making it a constructor parameter? Alternatively, do we maybe want to remove the unused file constructor parameter?

Edit: I don't think that would work. UploadFile is used to store streaming data in multipart requests. We don't know the size of the field until after we create the UploadFile.

What we could do is have:

class UploadFile:
    def __init__(
        self,
       file: Optional[BinaryIO] = None,
       file_size: Optional[int] = None,
    ) -> None:
         if file:
             self.size = file_size or 0
        else:
             self.size = 0

    async def write(self, data: bytes) -> None:
         self.size += len(data)
         ...

So for the edge case where the user provides a file like object, if they provide the size we use that as a starting point, otherwise we assume 0.

adriangb avatar Jan 11 '22 21:01 adriangb

Seeing that size is calculated attribute, I would rather drop file and update those few tests that use it to always write() like how formparser does it anyway.

It shouldn't be a problem seeing how file is not documented anywhere as public API?

rafalp avatar Jan 11 '22 21:01 rafalp

Agreed, I'm +1 for dropping the file parameter. But I think that should be its own PR, and should have input from Tom. It's not documented but it is part of the public constructor, so I think it would technically be a breaking change (low risk though)?

adriangb avatar Jan 11 '22 21:01 adriangb

Only scenario for end users doing this that comes to my mind is people creating custom UploadFile instances in their app's tests to see how their upload handling logic is handling them, eg. tests for value validators.

I'm actually doing that but I've resorted to writeing test payload as file was not documented and known to me prior of working on this PR.

rafalp avatar Jan 11 '22 21:01 rafalp

Yep that's exactly the only scenario I can think of. So it wouldn't even break their production, it would break their tests, if anything at all?

adriangb avatar Jan 11 '22 21:01 adriangb

I opened #1406 to evaluate that as an option

adriangb avatar Jan 11 '22 21:01 adriangb

What's an example use-case? What do other Python frameworks do here? (I usually ask for Flask and Django as the two points of comparison)

tomchristie avatar Jan 12 '22 10:01 tomchristie

@tomchristie

Other frameworks:

Django: UploadedFile has size attribute that contains uploaded file's size in bytes. ref Flask: FileStorage has content_length attribute that was extracted from upload's header. ref

Use-case for having size on upload makes it easy to implement and enforce different uploaded file size limits within application's logic. Eg. internet message board may have different upload limit for user profile image and different one for message attachments. Those size limits may be changed further depending on user's trust level. I'm also storing that size as part of attachment's metadata in database for displaying in app's UI.

This logic is relatively painless to implement in Django forms, but requires some research in Starlette.

Naturally this isn't here to protect app from huge uploads. There's NGINX in front of app with 32M payload limit in front of app, but if user uploads 2.1M and portrait photo limit is 2M, I'll rather handle this gracefully displaying them an error in app instead of NGINX's error page.

rafalp avatar Jan 12 '22 10:01 rafalp

I suppose we want to defer this to after #1406 - deps. on https://github.com/encode/starlette/pull/1406#issuecomment-1010921452 for instance.

tomchristie avatar Jan 12 '22 11:01 tomchristie

What's the conclusion?

NeurlAP avatar Jan 03 '23 05:01 NeurlAP

@NeurlAP its on the agenda for next meeting, which should happen next week. I'll post an update when there is one.

rafalp avatar Jan 18 '23 18:01 rafalp

@NeurlAP we've discussed the course of action for this issue and we've decided the following:

  1. Verify that #1413 will not break the FastAPI
  2. Merge the #1413
  3. Update this PR so we accept size as required __init__ argument next to the file - this size will come from the HTTP request parser or whoever which already knows it.
  4. We increase the size when you are writing to the file, to keep it in sync.

So if you want calculated file size you will do the upload.size, but if you want the declared file size from the uploader, you will do the upload.headers["content-length"]

rafalp avatar Jan 23 '23 13:01 rafalp

#1413 should be merged soon. Step 3 above needs to happen afterward, jfyk.

Kludex avatar Feb 04 '23 10:02 Kludex

@rafalp I think you can work on this now :)

adriangb avatar Feb 04 '23 18:02 adriangb

I've updated the PR. Not sure if in tests we want to set initial size to literal value, or do stream.getbuffer().nbytes. But I've went with literal.

rafalp avatar Feb 04 '23 22:02 rafalp

Can we make size optional? It gives us a bit more flexibility as a toolkit.

adriangb avatar Feb 04 '23 23:02 adriangb

I've updated UploadFile to make size optional. If initial value is not provided, then size is no longer calculated.

rafalp avatar Feb 05 '23 01:02 rafalp

I'm going to wait for this PR to release 0.24.0, jfyk.

Kludex avatar Feb 06 '23 06:02 Kludex

@Kludex ok. I guess only thing missing before we can merge this is rebase on master. Unless we want to wait for more reviews?

rafalp avatar Feb 06 '23 11:02 rafalp

I've set the auto-merge - once the pipeline passes, it will be merged.

I'm going to release Starlette 0.24.0 tonight.

Kludex avatar Feb 06 '23 12:02 Kludex