datasets
datasets copied to clipboard
Feature request(s) for rehydrate
Brad,
I don't know if this is appropriate (?) or even possible to request here. So let me know if not. I didn't see any instructions for contributions in the repo or in the pypi index for the python interface.
I think the rehydrate function would be significantly improved if:
- rehydrate prints the list of files, i.e.
--listflag by default.--dl(or something similar) could be substituted for--listto enable users to flag when they want to download the files. Since there's a possibility of downloading hundreds to thousands of files depending on the setting, I think it makes sense to let users know that before diving in to do the actual download. - Instead of using a
--directoryflag, take directory as a positional argument. A directory is required at any rate to rehydrate, why not take it as the first argument? a. If a directory is not specified, look in the current directory for thencbi_datasetdirectory. b. If a directory ending withncbi_datasetis specified, look in that dir instead of appendingncbi_datasetto the search.
I think naive users will be able to use the tool better with these changes. It's not immediately clear that the directory that contains the ncbi_dataset dir is the one that should be specified.
I think most users would be doing this:
datasets download genome accession GCF_000023265.1 --dehydrated
unzip ncbi_dataset.zip
datasets rehydrate ... (something)
I specified the --directory ncbi_datasets first and ran into an error. Obviously not a big deal, but --directory . seems a little unintuitive.
Feel free to ignore/implement these as you can.
Thanks,
Ed
One other thought I had would be allow specification of the zip file itself to the rehydrate function. You could easily unzip the file and get the directory structure that way.
import shutil
shutil.unpack_archive('ncbi_dataset.zip')
Anyway, look forward to hearing your thoughts.
Cheers
This is a great place to post ideas. Thank you!!
These are great ideas, and I've made some backlog tickets to address them internally.
And apologies it took me so long to respond, Brad
I'm going to leave this open until we have implemented and deployed the fix.