Suggestion: Download along with sequence metadata
Dear developer,
I just have a suggestion here. Sequence metadata usually is quite useful for data analysis for the downloaded sequence. Perhaps, you may consider adding that feature.
Looking forward to the feature.
Best regards, Chong
Hello @ChongLC , What kind of metadata do you have in mind? It might be possible, depending on what you mean.
Best, Francisco
Dear Franciso,
Perhaps the record from genpept (.gp)? Although from my side, I would like to have the details (source/organism information) of the sequences. While downloading a huge dataset at once, it is hard to deep mine the souce/organism information of each sequence.
Best regards, Chong
Dear Chong,
Sorry about the delay. I somehow missed the notification.
Are you aware of any API that can be used to get the .gp files?
Best, Francisco
Dear Developer,
I knew that they have the E-utilies function. You may want to refer to their documentation (https://www.ncbi.nlm.nih.gov/books/NBK25501/).
If I do not understand wrongly, you may download using the E-Utilities perl script by having a slight change. my $db = "protein"; my $query = "txid10239[Organism]"; my $report = "genpept";
Past three years, I downloaded using the batch Entrez function. However, I noticed there are some empty batches. Just for your information in case you are not aware of it.
As I also missed the notification sometimes, perhaps we can have further conversation through email ([email protected]) if you don't mind.
Best regards, CHONG
Dear Prof. Francisco,
It looks great while trying with a small dataset download (txid: 12637).
The command used:
python3 NCBI_downloader.py -q "txid12637[Organism:exp]" -d "protein" -nv -o denv.txt -f gb
Do close the issue if required. Thank you.
Best regards, CHONG
Dear @ChongLC, Great to hear it seems to be working. I will keep the issue open until I turn this makeshift version into a real integrated part of the program.
Best, Francisco