gget
gget copied to clipboard
Dataverse data access module
Aim
Including a draft dataverse data collector as part of gget
tool boxes.
e.g.
gget dataverse -j dataverse.json -o path-to-dir
Tasks
- [x] make a python module and CLI for data collection from dataverse
- [x] collect data for a list of datasets defined as a JSON
- [ ] build a JSON file for a list of datasets deposited in a given dataverse DOI at certain version
- [ ] collect data by providing a dataverse DOI and version
History
As part of https://github.com/GilbertLabUCSF/CanDI/issues/34, related to #121, I draft a module to collect data from dataverse. Then I thought data collection from dataverse can be a more general functionality here as part of gget
.
I mostly borrowed codes from gears/utils.py or TDC/load.py. However, there are other efforts such as pyDataverse and easyDataverse.
References
- https://guides.dataverse.org/en/latest/api/dataaccess.html
- https://github.com/mims-harvard/TDC/blob/main/tdc/utils/load.py
cc @amva13
Hi Abe, this is great work, thank you! I'm pretty busy right now but I'll try to start merging asap.
I just have two thoughts at the moment:
- We'll need to add unittests for this module. Do you have any ideas on how they should look like?
- Having a json file as input is a relatively complex input type. Is there perhaps a way around this to simplify the input to the module?
Hi Laura,
- We'll need to add unittests for this module. Do you have any ideas on how they should look like?
I'll think about and add more commits asap.
- Having a json file as input is a relatively complex input type. Is there perhaps a way around this to simplify the input to the module?
That's true. I think this can work with dataverse DOI and/or just fileId
. However, I personally think a json file can be a good option to list all files related to a single project or topic. I'll try to add more commits in this regard.
I made this a draft PR while I'm adding more commits.
Oops sorry didn't mean to close this
Oops sorry didn't mean to close this
Oh no worries, I also didn't get chance to finish what we discussed before. I have the changes in a fork so I'll open another PR in the future. Thanks :)
No need, I'll reopen this one as soon as the unit tests are passing and I can make a new dev branch
I did it again lol sorry
https://github.com/mims-harvard/TDC/issues/314#issuecomment-2396435932