waterbutler [SVCS-353] Look for Dataverse renamed files on upload

Dataverse 'ingests' certain file types. These file types get renamed. In upload when Waterbutler tries to find the correct metadata to return, it will 500 since it was not looking for the renamed file.

Ticket

https://openscience.atlassian.net/browse/SVCS-353

Purpose

Uploading some file types to dataverse would cause a 500 error to be thrown in waterbutler. The file would still upload, but would have a new extension.

Changes

Allow waterbutler to return the correct metadata based on the current file name, as well as a new property called file.original_name. This new property will parse DV raw metadata to figure out what the original file type was. This way WB can return the correct metadata and won't throw a 500. added tests for new functions.

Side effects

Currently DV just deletes old files if you try to upload a conflicting file name. (unsure if this is correct functionality or not) Because DV changes the file names when uploading certain file types (csv, xlsx, and others), it wont be able to delete the original because it will look like it does not exist. Instead DV uses their own renaming scheme , <conflicting-file-2< ... Because of this, when a file that DV renames gets uploaded, if it got renamed with a -1, or a -2 etc, WB won't look for that. instead it will return the metadata of the non-renamed version. This shouldn't be much of a problem, as it is really only used for updating fangorn. A page refresh negates this error

I had thought of a few ways to negate this, but decided against it as it would add quite a few more DV api calls to the process. If it is determined in CR that we want to do that, then it can be implemented

QA Notes

The effect file types are as follows: .csv, .RData, .sav, .dta, .por, .xlsx If you upload one of these files types to DV, it will be converted into a .tab file via their ingestion process.

Steps:

there are some sample file types on the JIRA ticket to use.
do not batch upload all the files (DV will return a 400 since it will get stuck on one of the ingestion operations)
One by one, upload each file
it should complete successfully and show up in fangorn as <name.tab>.
clicking on it should go to the right file.

Steps for 'side-effects' notes:

upload a file for the second time
it will look like 2 of the same file are on the page
refresh the page and one of them should show up with a -1

Deployment Notes

Nov 13 '17 16:11 AddisonSchiller

Coverage increased (+0.03%) to 89.136% when pulling 3ada43bfb31ad5ca0a6dc719812a85aba0dd3576 on AddisonSchiller:feature/SVCS-353-davaverse-csv-bug into 473191c78c36b6ee63d4609d3952a317ee4ab63e on CenterForOpenScience:develop.

Nov 13 '17 17:11 coveralls

Changes on last commit:

original_name -> original_names and is now a list This gets rid of the need for "RData", "rdata" and "Rdata" etc. It now returns a list of possible names to look for metadata.

If you upload a large RData file, it takes awhile for dataverse to convert it. Any uploads during that time will give a 400 back saying ingestion is currently clogged. the provider will now look for these messages and display a message to the user that the upload process is clogged and to wait a few seconds and try again.

To test the above upload an RData file. Once its load bar finishes, immediately upload another ingested file type. It should throw the error.

New tests for this new exception handling etc.

Nov 21 '17 18:11 AddisonSchiller

Coverage increased (+0.05%) to 89.992% when pulling c80368466b22d692f1028e899398e5d4873a84b8 on AddisonSchiller:feature/SVCS-353-davaverse-csv-bug into 26bf2093c15af333e634f14372709e7bf014ccb4 on CenterForOpenScience:develop.

Nov 21 '17 21:11 coveralls

Coverage increased (+0.05%) to 89.992% when pulling 9f0cdbbefe4791b8e487390a0784d51b6da260dd on AddisonSchiller:feature/SVCS-353-davaverse-csv-bug into 26bf2093c15af333e634f14372709e7bf014ccb4 on CenterForOpenScience:develop.

Nov 30 '17 15:11 coveralls

waterbutler waterbutler copied to clipboard

[SVCS-353] Look for Dataverse renamed files on upload

Ticket

Purpose

Changes

Side effects

QA Notes

Deployment Notes

waterbutler
waterbutler copied to clipboard