waterbutler
waterbutler copied to clipboard
[SVCS-353] Look for Dataverse renamed files on upload
Dataverse 'ingests' certain file types. These file types get renamed. In upload when Waterbutler tries to find the correct metadata to return, it will 500 since it was not looking for the renamed file.
Ticket
https://openscience.atlassian.net/browse/SVCS-353
Purpose
Uploading some file types to dataverse would cause a 500 error to be thrown in waterbutler. The file would still upload, but would have a new extension.
Changes
Allow waterbutler to return the correct metadata based on the current file name, as well as a new property called file.original_name
. This new property will parse DV raw metadata to figure out what the original file type was. This way WB can return the correct metadata and won't throw a 500.
added tests for new functions.
Side effects
Currently DV just deletes old files if you try to upload a conflicting file name. (unsure if this is correct functionality or not)
Because DV changes the file names when uploading certain file types (csv, xlsx, and others), it wont be able to delete the original because it will look like it does not exist. Instead DV uses their own renaming scheme
I had thought of a few ways to negate this, but decided against it as it would add quite a few more DV api calls to the process. If it is determined in CR that we want to do that, then it can be implemented
QA Notes
The effect file types are as follows: .csv, .RData, .sav, .dta, .por, .xlsx
If you upload one of these files types to DV, it will be converted into a .tab
file via their ingestion
process.
Steps:
- there are some sample file types on the JIRA ticket to use.
- do not batch upload all the files (DV will return a 400 since it will get stuck on one of the ingestion operations)
- One by one, upload each file
- it should complete successfully and show up in fangorn as <name.tab>.
- clicking on it should go to the right file.
Steps for 'side-effects' notes:
- upload a file for the second time
- it will look like 2 of the same file are on the page
- refresh the page and one of them should show up with a -1
Deployment Notes
Coverage increased (+0.03%) to 89.136% when pulling 3ada43bfb31ad5ca0a6dc719812a85aba0dd3576 on AddisonSchiller:feature/SVCS-353-davaverse-csv-bug into 473191c78c36b6ee63d4609d3952a317ee4ab63e on CenterForOpenScience:develop.
Changes on last commit:
original_name
-> original_names
and is now a list
This gets rid of the need for "RData", "rdata" and "Rdata" etc. It now returns a list of possible names to look for metadata.
If you upload a large RData file, it takes awhile for dataverse to convert it. Any uploads during that time will give a 400 back saying ingestion is currently clogged. the provider will now look for these messages and display a message to the user that the upload process is clogged and to wait a few seconds and try again.
To test the above upload an RData file. Once its load bar finishes, immediately upload another ingested file type. It should throw the error.
New tests for this new exception handling etc.
Coverage increased (+0.05%) to 89.992% when pulling c80368466b22d692f1028e899398e5d4873a84b8 on AddisonSchiller:feature/SVCS-353-davaverse-csv-bug into 26bf2093c15af333e634f14372709e7bf014ccb4 on CenterForOpenScience:develop.