NCBI_Mass_Downloader Suggestion: Download along with sequence metadata

Dear developer,

I just have a suggestion here. Sequence metadata usually is quite useful for data analysis for the downloaded sequence. Perhaps, you may consider adding that feature.

Looking forward to the feature.

Best regards, Chong

Oct 26 '21 20:10 ChongLC

Hello @ChongLC , What kind of metadata do you have in mind? It might be possible, depending on what you mean.

Best, Francisco

Oct 26 '21 20:10 StuntsPT

Dear Franciso,

Perhaps the record from genpept (.gp)? Although from my side, I would like to have the details (source/organism information) of the sequences. While downloading a huge dataset at once, it is hard to deep mine the souce/organism information of each sequence.

Best regards, Chong

Oct 26 '21 21:10 ChongLC

Dear Chong,

Sorry about the delay. I somehow missed the notification. Are you aware of any API that can be used to get the .gp files?

Best, Francisco

Nov 03 '21 18:11 StuntsPT

Dear Developer,

I knew that they have the E-utilies function. You may want to refer to their documentation (https://www.ncbi.nlm.nih.gov/books/NBK25501/).

If I do not understand wrongly, you may download using the E-Utilities perl script by having a slight change. my $db = "protein"; my $query = "txid10239[Organism]"; my $report = "genpept";

Past three years, I downloaded using the batch Entrez function. However, I noticed there are some empty batches. Just for your information in case you are not aware of it.

As I also missed the notification sometimes, perhaps we can have further conversation through email ([email protected]) if you don't mind.

Best regards, CHONG

Nov 05 '21 17:11 ChongLC

Dear Prof. Francisco,

It looks great while trying with a small dataset download (txid: 12637). The command used: python3 NCBI_downloader.py -q "txid12637[Organism:exp]" -d "protein" -nv -o denv.txt -f gb

Do close the issue if required. Thank you.

Best regards, CHONG

Jan 14 '22 08:01 ChongLC

Dear @ChongLC, Great to hear it seems to be working. I will keep the issue open until I turn this makeshift version into a real integrated part of the program.

Best, Francisco

Jan 14 '22 16:01 StuntsPT