dash-uploader icon indicating copy to clipboard operation
dash-uploader copied to clipboard

Support for multiple dash-uploader components on same app

Open fohrloop opened this issue 4 years ago • 7 comments

In the future, it might be useful to have a possibility to add multiple du.Upload components on the same app. This has some implications

  • First, there should be one flask route configured per each Upload component.
  • The uploads should not clash with each another; Even if user has three components uploading my_big_data.csv (same filename), all the chunks should be perfectly independent from each other.
  • Configuring callbacks must take this into account

fohrloop avatar Apr 26 '21 16:04 fohrloop

The synchronization problem mentioned in this issue may require modifications for the BaseHttpRequestHandler. Each time we write the chunk file or check the existence of the chunk file, we need to add a lock to protect the writing thread.

cainmagi avatar Apr 28 '21 15:04 cainmagi

I think there are some kind of locks already in use, if I remember correctly. What I was thinking is that if there are two du.Upload components, and both of them are uploading my_giga_data.csv at the same time, and both JS components send my_giga_data chunk with some resumableChunkNumber (and perhaps some resumableIdentifier), the chunks should be written in such way that they cannot override each other. So perhaps:

  • my_giga_data.csv_part_001_unique_id_from_upload_component
  • my_giga_data.csv_part_001_second_unique_id_from_upload_component

etc. Then, when the upload is done, then there is decision to

  • first my_giga_data.csv gets created by upload component 1
  • then my_giga_data.csv gets created again (overridden!) by component 2, or my_giga_data (2).csv is saved (as in Windows, for example).

maybe there are some other options, too?

fohrloop avatar Apr 28 '21 19:04 fohrloop

Related incoming changes in this PR: https://github.com/np-8/dash-uploader/pull/36

fohrloop avatar Apr 28 '21 19:04 fohrloop

Thank you for your explanation. I think there are two cases in the current implementation.

  1. If upload_id is used, the files would be separated by subfolders. The chunk file would be saved as <root>/<upload_id>/<resumableIdentifier>-<file_name>/<file_name>_part_<number>. The finally uploaded file would be saved as <root>/<upload_id>/<file_name>. So I think there would be no conflicts between different uploaders.
  2. If upload_id is not used, all files would be saved in the same folder, so the conflict may happen. The chunk file would be saved as <root>/<resumableIdentifier>-<file_name>/<file_name>_part_<number>, and the uploaded file would be saved as <root>/<file_name>. It seems that the resumableIdentifier would be always the same for the same file. So both the conflicts between chunks and the conflicts between saved files may happen. If your lock work, the conflicts between chunks may be handled.

I have reviewed your codes about the lock. It seems that the lock is implemented by a file. It is not a typical threading.Lock. I did not know that the lock could be implemented like your way before. After I finish the pull request for supporting flask and cross-domain, I would start some tests about multi-uploader and check how it works.

In my expectation, if your upload_id is not set, the files would be uploaded to the same location. In this case, using two uploaders for uploading the same file seems to be very strange, because they should share the same progress and updating the same file chunks. It will be tricky after the uploading, because the user-implemented callbacks may also cause conflicts. I wonder why we need this feature. If we just want to accelerate uploading, we should focus on simultaneousUploads.

Thank you!

cainmagi avatar May 03 '21 14:05 cainmagi

Oh yeah the locking is implemented by a file. I don't know why (it was there when I forked this), and as it works I have not needed to touch it.

I have been thinking if someone would like to have an app with two upload components. Something like this:

fohrloop avatar May 03 '21 20:05 fohrloop

That is exactly what I want to test. I think I could write some codes for checking the performance in this case if I could finish the other on-progress PRs.

cainmagi avatar May 03 '21 21:05 cainmagi

Yeah this definitely needs very detailed automated testing as adding two upload components makes things a lot more complicated. I think this will proceed very nicely after there is some sort of test setup.

fohrloop avatar May 04 '21 06:05 fohrloop