web-components icon indicating copy to clipboard operation
web-components copied to clipboard

Upload component: Limit number of connections opened to server, upload in batches

Open bwajtr opened this issue 1 year ago • 6 comments

Describe your motivation

On a customer project, the customer needed to implement a mass-data import page. The import process consisted of two stages:

  1. Upload a single Excel file, containing one entity record per row. Each of these entities could reference an image file (imagine an e-shop record with a photo of the product)
  2. Upload image files, referenced by the Excel

The problem in this particular case was, that each uploaded image has to be processed by some microservice (resizing, normalizing etc.) - so the file upload request takes much longer than just what is needed to transfer the file.

We used the standard Vaadin Upload component for the upload of the images, but the customer reported issues and errors, when trying to upload 1600 images (!).

The issue here is that the customer clicked on "Upload Files..." button, selected 1600 images at once and clicked Ok. At this moment 1600 connections are opened to the server. The issue is that due to the slow upload and slow server-side processing of each image, some of the the connections have to wait minutes before the upload really starts - this leads to errors, timeouts and such issues.

I understand that the Upload component was probably not suited for such large amount of files, but it's still far better option to use Upload component, then go with some custom solution.

Describe the solution you'd like

I'd still want the Upload component to be in Auto-Upload mode, but open only for example 6 connections to the server at once (or have it configurable) so the connections do not have to wait for so long. That means that for example when 12 files are added for upload, then only 6 upload requests for the first 6 files are started immediately, but the other 6 files only wait in the queue, until some of the previous files upload is finished (successfully or not).

And/Or at least add some nicer API to be able to listen on event when new files has been added to the queue. It's doable with the current API too, but it does not look that nice. This new listener would come handy because we were able to implement the above solution by:

  1. Setting Upload to "noAuto" mode
  2. Using javascript start the upload of the next file when the previous file has finished uploading

But I would appreciate if something like that is directly in the Upload component, or the component provides API to make implementation of the following code easier:

import {Upload} from '@vaadin/upload'

// noinspection JSUnusedGlobalSymbols
class BatchUploadEnhancer {

    /**
     * @param {Upload} upload - The upload component to enhance
     */
    startUploadingNextQueuedFile(upload) {
        const nextFileToUpload = upload.files.find(file => file.held)
        if (nextFileToUpload) {
            upload.uploadFiles(nextFileToUpload)
        }
    }

    startUploadingBatch(upload, batchSize) {
        // finds first five queued files
        const filesToUpload = upload.files.filter(file => file.held).slice(0, batchSize)
        if (filesToUpload && filesToUpload.length > 0) {
            upload.uploadFiles(filesToUpload)
        }
    }

    containsQueuedFiles(upload) {
        return upload.files.some(file => file.held)
    }

    /**
     * @param {Upload} upload - The upload component to enhance
     * @param {number} batchSize - The maximum number of images to upload at once. This is to prevent the server from being overloaded or connections to timeout
     */
    initUploadComponent(upload, batchSize) {
        // disable automatically opening a connection to upload the file
        upload.noAuto = true;

        // start uploading next file in queue when a file is successfully uploaded
        upload.addEventListener('upload-success', () => {
            this.startUploadingNextQueuedFile(upload)
        });

        // start uploading next file in queue also when there is an error when uploading the file
        upload.addEventListener('upload-error', () => {
            this.startUploadingNextQueuedFile(upload)
        });

        // when some new files are added -> start uploading in batch
        // There is unfortunately no better way how to detect that new files were added than to observe changes on the "files" property.
        // When "event.detail.path" is not defined, then it means that file was either added or removed from the "files" array.
        // When file is added, we set the timeout to start uploading the batch - that way all the files which were added
        // using a file dialog will be already present in the "files" array when the startUploadingBatch method is called.
        upload.addEventListener('files-changed', (event) => {
            if (!event.detail.path && !upload.batchUploadStarter && this.containsQueuedFiles(upload)) {
                upload.batchUploadStarter = setTimeout(() => {
                    upload.batchUploadStarter = null;
                    this.startUploadingBatch(upload, batchSize)
                })
            }
        });
    }
}

window.batchUploadEnhancerModule = new BatchUploadEnhancer();

This is then used in the Java code as

private static final int IMAGE_UPLOAD_BATCH_SIZE = 4

uploadComponent.element.executeJs("window.batchUploadEnhancerModule.initUploadComponent(this, ${IMAGE_UPLOAD_BATCH_SIZE})")

Describe alternatives you've considered

Another option is to ZIP those 1600 images to a ZIP file and upload that ZIP file instead of separate images. We did that, but faced non-functional constraints in production:

  1. The web server did not allow to upload files bigger than 10Mb for security reasons
  2. That ZIP file had 250Mb, and 800Mb unpacked -> we did unpacking on the server in-memory -> which raised the memory demands by over 1Gb for a short time... leading to memory issues and degraded performance

Customer simply asked us to find another solution

Additional context

No response

bwajtr avatar Oct 26 '23 08:10 bwajtr

The maximum amount of uploads can be configured on the server with a custom https://github.com/vaadin/flow/blob/8742a8ba19c3a8de51ec4fa2b4a8bbc441fe7157/flow-server/src/main/java/com/vaadin/flow/server/communication/StreamReceiverHandler.java - there is also an instruction "how to" if you look at the catch block for too many files

knoobie avatar Oct 26 '23 10:10 knoobie

Thanks @knoobie, but if you mean "how to" described in the limitInfoStr variable in that file, then these limits are not what I have a problem with. The problem is that the Upload web-component (in the browser) opens too many connections at once immediately after the files are queued for upload (in Auto-Upload mode). I'm looking for a change of the web component here... not the server side... At least I don't see any way how doing anything on the server would prevent the web-component to open that many connections at once...

bwajtr avatar Oct 26 '23 10:10 bwajtr

It would not stop the web component directly, some uploads would just fail and has to be triggered again, just wanted to point this out, especially since your application sounds like the prime example where the server should (also) throttle the possibe upload count

knoobie avatar Oct 26 '23 10:10 knoobie

Makes me think that perhaps we should have some reasonable default file-count limit even if we manage to introduce some kind of batching. That way at least the developer would need to make a conscious decision to allow uploading of 1600 files simultaneously, and perhaps take a minute to consider the consequences that may have on the receiving end.

rolfsmeds avatar Oct 27 '23 09:10 rolfsmeds

I recently re-wrote my re-write of the server side Upload component. During that effort I was also missing this feature. I think it makes no sense at all to try to parallelise the file uploads and at least there should be a sane maximum number for concurrent uploads. In ancient times of internet dinosaurs, there use to be a limit of two connections, by browsers, but not anymore...

I think I'll derive something from the solution from @bwajtr to my "proper upload" component...

mstahv avatar Jan 10 '24 07:01 mstahv

Here is my solution:

https://github.com/viritin/flow-viritin/commit/e50dcfe2d55eeadac182f0f41d6314637f818c3e

Next question will probably be (and related if this actually gets on the desk): how to estimate the remaining time when "all uploads are done". Kind of problem already today if downloading multiple files, n+1 estimates are not reliable as e.g. small files might be finished -> will speed up other XHRs.

mstahv avatar Jan 10 '24 09:01 mstahv