datasets Feature request(s) for rehydrate

Brad,

I don't know if this is appropriate (?) or even possible to request here. So let me know if not. I didn't see any instructions for contributions in the repo or in the pypi index for the python interface.

I think the rehydrate function would be significantly improved if:

rehydrate prints the list of files, i.e. --list flag by default. --dl (or something similar) could be substituted for --list to enable users to flag when they want to download the files. Since there's a possibility of downloading hundreds to thousands of files depending on the setting, I think it makes sense to let users know that before diving in to do the actual download.
Instead of using a --directory flag, take directory as a positional argument. A directory is required at any rate to rehydrate, why not take it as the first argument? a. If a directory is not specified, look in the current directory for the ncbi_dataset directory. b. If a directory ending with ncbi_dataset is specified, look in that dir instead of appending ncbi_dataset to the search.

I think naive users will be able to use the tool better with these changes. It's not immediately clear that the directory that contains the ncbi_dataset dir is the one that should be specified.

I think most users would be doing this:

datasets download genome accession GCF_000023265.1 --dehydrated
unzip ncbi_dataset.zip
datasets rehydrate ... (something)

I specified the --directory ncbi_datasets first and ran into an error. Obviously not a big deal, but --directory . seems a little unintuitive.

Feel free to ignore/implement these as you can.

Thanks,

Ed

Oct 23 '20 20:10 davised

One other thought I had would be allow specification of the zip file itself to the rehydrate function. You could easily unzip the file and get the directory structure that way.

import shutil
shutil.unpack_archive('ncbi_dataset.zip')

Anyway, look forward to hearing your thoughts.

Cheers

Oct 23 '20 20:10 davised

This is a great place to post ideas. Thank you!!

These are great ideas, and I've made some backlog tickets to address them internally.

And apologies it took me so long to respond, Brad

Nov 02 '20 16:11 BradHolmes

I'm going to leave this open until we have implemented and deployed the fix.

Nov 02 '20 22:11 BradHolmes

datasets datasets copied to clipboard

Feature request(s) for rehydrate

datasets
datasets copied to clipboard