dash-uploader
dash-uploader copied to clipboard
Support for multiple dash-uploader components on same app
In the future, it might be useful to have a possibility to add multiple du.Upload components on the same app. This has some implications
- First, there should be one flask route configured per each Upload component.
- The uploads should not clash with each another; Even if user has three components uploading
my_big_data.csv(same filename), all the chunks should be perfectly independent from each other. - Configuring callbacks must take this into account
The synchronization problem mentioned in this issue may require modifications for the BaseHttpRequestHandler. Each time we write the chunk file or check the existence of the chunk file, we need to add a lock to protect the writing thread.
I think there are some kind of locks already in use, if I remember correctly. What I was thinking is that if there are two du.Upload components, and both of them are uploading my_giga_data.csv at the same time, and both JS components send my_giga_data chunk with some resumableChunkNumber (and perhaps some resumableIdentifier), the chunks should be written in such way that they cannot override each other. So perhaps:
- my_giga_data.csv_part_001_unique_id_from_upload_component
- my_giga_data.csv_part_001_second_unique_id_from_upload_component
etc. Then, when the upload is done, then there is decision to
- first
my_giga_data.csvgets created by upload component 1 - then
my_giga_data.csvgets created again (overridden!) by component 2, ormy_giga_data (2).csvis saved (as in Windows, for example).
maybe there are some other options, too?
Related incoming changes in this PR: https://github.com/np-8/dash-uploader/pull/36
Thank you for your explanation. I think there are two cases in the current implementation.
- If
upload_idis used, the files would be separated by subfolders. The chunk file would be saved as<root>/<upload_id>/<resumableIdentifier>-<file_name>/<file_name>_part_<number>. The finally uploaded file would be saved as<root>/<upload_id>/<file_name>. So I think there would be no conflicts between different uploaders. - If
upload_idis not used, all files would be saved in the same folder, so the conflict may happen. The chunk file would be saved as<root>/<resumableIdentifier>-<file_name>/<file_name>_part_<number>, and the uploaded file would be saved as<root>/<file_name>. It seems that theresumableIdentifierwould be always the same for the same file. So both the conflicts between chunks and the conflicts between saved files may happen. If your lock work, the conflicts between chunks may be handled.
I have reviewed your codes about the lock. It seems that the lock is implemented by a file. It is not a typical threading.Lock. I did not know that the lock could be implemented like your way before. After I finish the pull request for supporting flask and cross-domain, I would start some tests about multi-uploader and check how it works.
In my expectation, if your upload_id is not set, the files would be uploaded to the same location. In this case, using two uploaders for uploading the same file seems to be very strange, because they should share the same progress and updating the same file chunks. It will be tricky after the uploading, because the user-implemented callbacks may also cause conflicts. I wonder why we need this feature. If we just want to accelerate uploading, we should focus on simultaneousUploads.
Thank you!
Oh yeah the locking is implemented by a file. I don't know why (it was there when I forked this), and as it works I have not needed to touch it.
I have been thinking if someone would like to have an app with two upload components. Something like this:
That is exactly what I want to test. I think I could write some codes for checking the performance in this case if I could finish the other on-progress PRs.
Yeah this definitely needs very detailed automated testing as adding two upload components makes things a lot more complicated. I think this will proceed very nicely after there is some sort of test setup.