API/HTTP file upload over ~ 500 MB fails and crashes dataverse-docker, while upload via web-interface usually works
What steps does it take to reproduce the issue? Upload of a file over 500 MB (435 worked still) to a dataset via the API, like with pyDVuploader or DVCLI (Rust), likely also via curl-commands. This was found after a (now fixed) bug in pyDVuploader: https://github.com/gdcc/python-dvuploader/issues/47#issuecomment-3442453456 Note external reproduction by @JR-1991 https://github.com/gdcc/python-dvuploader/issues/47#issuecomment-3443037594
Upload via Web-Interface does work (usually). We "upload" from localhost in our setup, which is relatively fast. However, slowing down localhost to a "normal" network speed does NOT solve the issue -> not speed related
-
When does this issue occur?
During the upload, the progress stalls (in testing a bit before 500 MB). After less than half a minute, the upload progress continues slowly. At 100%, the upload command does not finish, however. Opening the web GUI reveals that the server / docker container crashed. The upload command continues to run for ~10 minutes, after which usually read error is thrown (see issue above). -
What happens? Docker / Web server restarts, upload fails. Sometimes the uploaded file actually is stored on the dataset (smaller files?), but in most cases it is not and the upload completely fails.
-
To whom does it occur (all users, curators, superusers)? Superuser via API, likely also others
-
What did you expect to happen? A finishing file upload to the dataset
Which version of Dataverse are you using? 6.8, also appeared with 6.6, as docker instance
Any related open or closed issues to this bug report? https://github.com/gdcc/python-dvuploader/issues/47 Did not find an existing main-dataverse issue.
Screenshots: Please see related bug report in pyDVuploader
Are you thinking about creating a pull request for this issue? We have no idea what causes the issue. We might circumvent the bug via a local S3 storage. However, this complicates our setup, also our development setup.
@JulianRein interesting. Can you reproduce this with the Docker images you can spin up by following https://guides.dataverse.org/en/6.8/container/running/demo.html#quickstart ?
@pdurbin just tested it again on the linked "virgin" dataverse docker. (Did not know if @JR-1991 used that for his bug replication or not). So can confirm: Same behavior, upload stutter shortly before 500 MB and crash after. (tested with 1.5 Gig file this time, random binary file)
@slebedeva in our team solved this with changes to the container. Seems the Web-GUI and API uploads are stored in different docker-directories, thus the different behavior. The underlying cause is a space issue within the container which can be remedied by mounting a folder to the underlying file system, for example. Might be a good change to the general config to avoid this very confusing issue.
@JulianRein @slebedeva interesting. I have this on the agenda for the next containerization meeting. You are both welcome to join, of course! (Svetlana joined a while back.)