reflex icon indicating copy to clipboard operation
reflex copied to clipboard

File upload appears to hold everything in memory until the end

Open bertbarabas opened this issue 1 year ago β€’ 7 comments

Describe the bug

  1. For large files, memory upload consumes significant memory and it grows even more when writing out the file iobuffer.
  2. Upload is very slow and consumes significant CPU when everything is on the same host (both reflex and the browser). Uploading files shouldn't really consume much CPU at all...

To Reproduce Steps to reproduce the behavior:

  • Just use the standard upload example and upload a multi-gigabyte file.

Expected behavior Very little memory should be used because the data should be incrementally written to a file. To avoid re-writing the file after upload, I'd expect to be able to specify the target or if you choose to write to a temporary file I'd expect to be able to rename the temporary file to it's final home.

Specifics (please complete the following information):

  • Python Version: 3.12.4
  • Reflex Version: 0.5.4
  • OS: WSL Ubuntu 22.04.4 LTS on Windows 11 pro
  • Browser (Optional): Brave 1.67.116 ( Chromium 126.0.6478.71)

bertbarabas avatar Jun 19 '24 00:06 bertbarabas

Another odd behavior for upload is the on_upload_progress which claims the upload is 100% done but then spends another 10% of the time before returning from rx.upload_files and passing the result on to the handle_upload function.

bertbarabas avatar Jun 28 '24 00:06 bertbarabas

can i work on these issuse

jaypatidar14 avatar Aug 06 '24 12:08 jaypatidar14

@jaypatidar14 just assigned you! Let us know if you need any help

picklelo avatar Aug 06 '24 16:08 picklelo

Hello! can I work on this issue?

garv901 avatar Aug 20 '24 13:08 garv901

@garv901 assigned you!

picklelo avatar Aug 20 '24 17:08 picklelo

I should point out that the very large memory consumption only happens at the end of the upload just as the file write is starting.

It seems fairly moderate during most of the upload which I attribute to the fact that uvicorn is doing the actual upload and writing to a temp file behind the scenes.

To try to get around this issue, I started writing out the upload in chucks (see below) but it didn't help. If I upload a 4GB file my memory jumps from 1GB to 5GB at the end of the upload.

with self.wip_file.open("wb") as file_object:
    while upload_data := await file.read(1_000_000):
        file_object.write(upload_data)

bertbarabas avatar Aug 29 '24 02:08 bertbarabas

I have one more observation that seems to point the root of the issue at gunicorn. I decided to see if it was any faster to upload when running reflex in production mode and I see now that instead of python consuming 100% CPU, it's the gunicorn process.

image

bertbarabas avatar Sep 09 '24 03:09 bertbarabas