web-components
web-components copied to clipboard
Upload component: Limit number of connections opened to server, upload in batches
Describe your motivation
On a customer project, the customer needed to implement a mass-data import page. The import process consisted of two stages:
- Upload a single Excel file, containing one entity record per row. Each of these entities could reference an image file (imagine an e-shop record with a photo of the product)
- Upload image files, referenced by the Excel
The problem in this particular case was, that each uploaded image has to be processed by some microservice (resizing, normalizing etc.) - so the file upload request takes much longer than just what is needed to transfer the file.
We used the standard Vaadin Upload component for the upload of the images, but the customer reported issues and errors, when trying to upload 1600 images (!).
The issue here is that the customer clicked on "Upload Files..." button, selected 1600 images at once and clicked Ok. At this moment 1600 connections are opened to the server. The issue is that due to the slow upload and slow server-side processing of each image, some of the the connections have to wait minutes before the upload really starts - this leads to errors, timeouts and such issues.
I understand that the Upload component was probably not suited for such large amount of files, but it's still far better option to use Upload component, then go with some custom solution.
Describe the solution you'd like
I'd still want the Upload component to be in Auto-Upload mode, but open only for example 6 connections to the server at once (or have it configurable) so the connections do not have to wait for so long. That means that for example when 12 files are added for upload, then only 6 upload requests for the first 6 files are started immediately, but the other 6 files only wait in the queue, until some of the previous files upload is finished (successfully or not).
And/Or at least add some nicer API to be able to listen on event when new files has been added to the queue. It's doable with the current API too, but it does not look that nice. This new listener would come handy because we were able to implement the above solution by:
- Setting Upload to "noAuto" mode
- Using javascript start the upload of the next file when the previous file has finished uploading
But I would appreciate if something like that is directly in the Upload component, or the component provides API to make implementation of the following code easier:
import {Upload} from '@vaadin/upload'
// noinspection JSUnusedGlobalSymbols
class BatchUploadEnhancer {
/**
* @param {Upload} upload - The upload component to enhance
*/
startUploadingNextQueuedFile(upload) {
const nextFileToUpload = upload.files.find(file => file.held)
if (nextFileToUpload) {
upload.uploadFiles(nextFileToUpload)
}
}
startUploadingBatch(upload, batchSize) {
// finds first five queued files
const filesToUpload = upload.files.filter(file => file.held).slice(0, batchSize)
if (filesToUpload && filesToUpload.length > 0) {
upload.uploadFiles(filesToUpload)
}
}
containsQueuedFiles(upload) {
return upload.files.some(file => file.held)
}
/**
* @param {Upload} upload - The upload component to enhance
* @param {number} batchSize - The maximum number of images to upload at once. This is to prevent the server from being overloaded or connections to timeout
*/
initUploadComponent(upload, batchSize) {
// disable automatically opening a connection to upload the file
upload.noAuto = true;
// start uploading next file in queue when a file is successfully uploaded
upload.addEventListener('upload-success', () => {
this.startUploadingNextQueuedFile(upload)
});
// start uploading next file in queue also when there is an error when uploading the file
upload.addEventListener('upload-error', () => {
this.startUploadingNextQueuedFile(upload)
});
// when some new files are added -> start uploading in batch
// There is unfortunately no better way how to detect that new files were added than to observe changes on the "files" property.
// When "event.detail.path" is not defined, then it means that file was either added or removed from the "files" array.
// When file is added, we set the timeout to start uploading the batch - that way all the files which were added
// using a file dialog will be already present in the "files" array when the startUploadingBatch method is called.
upload.addEventListener('files-changed', (event) => {
if (!event.detail.path && !upload.batchUploadStarter && this.containsQueuedFiles(upload)) {
upload.batchUploadStarter = setTimeout(() => {
upload.batchUploadStarter = null;
this.startUploadingBatch(upload, batchSize)
})
}
});
}
}
window.batchUploadEnhancerModule = new BatchUploadEnhancer();
This is then used in the Java code as
private static final int IMAGE_UPLOAD_BATCH_SIZE = 4
uploadComponent.element.executeJs("window.batchUploadEnhancerModule.initUploadComponent(this, ${IMAGE_UPLOAD_BATCH_SIZE})")
Describe alternatives you've considered
Another option is to ZIP those 1600 images to a ZIP file and upload that ZIP file instead of separate images. We did that, but faced non-functional constraints in production:
- The web server did not allow to upload files bigger than 10Mb for security reasons
- That ZIP file had 250Mb, and 800Mb unpacked -> we did unpacking on the server in-memory -> which raised the memory demands by over 1Gb for a short time... leading to memory issues and degraded performance
Customer simply asked us to find another solution
Additional context
No response
The maximum amount of uploads can be configured on the server with a custom https://github.com/vaadin/flow/blob/8742a8ba19c3a8de51ec4fa2b4a8bbc441fe7157/flow-server/src/main/java/com/vaadin/flow/server/communication/StreamReceiverHandler.java - there is also an instruction "how to" if you look at the catch block for too many files
Thanks @knoobie, but if you mean "how to" described in the limitInfoStr
variable in that file, then these limits are not what I have a problem with. The problem is that the Upload web-component (in the browser) opens too many connections at once immediately after the files are queued for upload (in Auto-Upload mode). I'm looking for a change of the web component here... not the server side... At least I don't see any way how doing anything on the server would prevent the web-component to open that many connections at once...
It would not stop the web component directly, some uploads would just fail and has to be triggered again, just wanted to point this out, especially since your application sounds like the prime example where the server should (also) throttle the possibe upload count
Makes me think that perhaps we should have some reasonable default file-count limit even if we manage to introduce some kind of batching. That way at least the developer would need to make a conscious decision to allow uploading of 1600 files simultaneously, and perhaps take a minute to consider the consequences that may have on the receiving end.
I recently re-wrote my re-write of the server side Upload component. During that effort I was also missing this feature. I think it makes no sense at all to try to parallelise the file uploads and at least there should be a sane maximum number for concurrent uploads. In ancient times of internet dinosaurs, there use to be a limit of two connections, by browsers, but not anymore...
I think I'll derive something from the solution from @bwajtr to my "proper upload" component...
Here is my solution:
https://github.com/viritin/flow-viritin/commit/e50dcfe2d55eeadac182f0f41d6314637f818c3e
Next question will probably be (and related if this actually gets on the desk): how to estimate the remaining time when "all uploads are done". Kind of problem already today if downloading multiple files, n+1 estimates are not reliable as e.g. small files might be finished -> will speed up other XHRs.