NCBI_Mass_Downloader icon indicating copy to clipboard operation
NCBI_Mass_Downloader copied to clipboard

Suggestion: Download along with sequence metadata

Open ChongLC opened this issue 4 years ago • 6 comments

Dear developer,

I just have a suggestion here. Sequence metadata usually is quite useful for data analysis for the downloaded sequence. Perhaps, you may consider adding that feature.

Looking forward to the feature.

Best regards, Chong

ChongLC avatar Oct 26 '21 20:10 ChongLC

Hello @ChongLC , What kind of metadata do you have in mind? It might be possible, depending on what you mean.

Best, Francisco

StuntsPT avatar Oct 26 '21 20:10 StuntsPT

Dear Franciso,

Perhaps the record from genpept (.gp)? Although from my side, I would like to have the details (source/organism information) of the sequences. While downloading a huge dataset at once, it is hard to deep mine the souce/organism information of each sequence.

Best regards, Chong

ChongLC avatar Oct 26 '21 21:10 ChongLC

Dear Chong,

Sorry about the delay. I somehow missed the notification. Are you aware of any API that can be used to get the .gp files?

Best, Francisco

StuntsPT avatar Nov 03 '21 18:11 StuntsPT

Dear Developer,

I knew that they have the E-utilies function. You may want to refer to their documentation (https://www.ncbi.nlm.nih.gov/books/NBK25501/).

If I do not understand wrongly, you may download using the E-Utilities perl script by having a slight change. my $db = "protein"; my $query = "txid10239[Organism]"; my $report = "genpept";

Past three years, I downloaded using the batch Entrez function. However, I noticed there are some empty batches. Just for your information in case you are not aware of it.

As I also missed the notification sometimes, perhaps we can have further conversation through email ([email protected]) if you don't mind.

Best regards, CHONG

ChongLC avatar Nov 05 '21 17:11 ChongLC

Dear Prof. Francisco,

It looks great while trying with a small dataset download (txid: 12637). The command used: python3 NCBI_downloader.py -q "txid12637[Organism:exp]" -d "protein" -nv -o denv.txt -f gb

Do close the issue if required. Thank you.

Best regards, CHONG

ChongLC avatar Jan 14 '22 08:01 ChongLC

Dear @ChongLC, Great to hear it seems to be working. I will keep the issue open until I turn this makeshift version into a real integrated part of the program.

Best, Francisco

StuntsPT avatar Jan 14 '22 16:01 StuntsPT