phylophlan icon indicating copy to clipboard operation
phylophlan copied to clipboard

The download of reference genomes

Open lipumpkin opened this issue 2 years ago • 2 comments

Hi, professor fasnicar Now i have a question about the option -g in phylophlan_get_reference. I downloaded ref genomes for genus Acinetobacter by this command (phylophlan_get_reference -g g__Acinetobacter -o input_genomes/ -n 1 --verbose 2>&1 | tee logs/phylophlan_get_reference.log). And i got 227 genomes of this genus finally. The txt(assembly_summary_genbank.txt) shows that over 10,000 species belong to genus Acinetobacter. And then I tried other command (-n 300), but i got 806 genomes finally. On what basis were these 227 or 806 species selected? And did they include all child taxa (species) with a validly published of the genus?
Thanks

lipumpkin avatar Mar 22 '22 09:03 lipumpkin

Hi, the -n parameter is an "up to" for each single species. To make an example, let's assume you specify (as you reported above):

phylophlan_get_reference -g g__Acinetobacter -o input_genomes/ -n 5

then up to 5 genomes for each species listed under g__Acinetobacter will be downloaded. Now, again for the sake of the example, assume that there are only 3 species followed by the number of available genomes:

g__Acinetobacter|s__species_1    3
g__Acinetobacter|s__species_2    15
g__Acinetobacter|s__species_3    6

In total, you have that there are 24 genomes, but you end up downloading 13 since s__species_1 only have 3 genomes.

Now, if you check phylophlan_get_reference -l | grep "g__Acinetobacter" | less -S you'll find:

k__Bacteria|p__Proteobacteria|[..]|f__Moraxellaceae|g__Acinetobacter       227     2984

The above means that there are 227 species listed under g__Acinetobacter and in total there are 2984 genomes that can be retrieved. So, it makes sense that you downloaded 227 genomes with -n 1 and 806 with -n 300 As there is s__Acinetobacter_baumannii with 2478 genomes.

I hope this helps.

Thanks, Francesco

fasnicar avatar Mar 29 '22 13:03 fasnicar

Hi, thank you very much.

I have fully understand the meaning of the -n parameter. There is no doubt that your answers help me understand this code better.

Thanks, Zikun

lipumpkin avatar Apr 02 '22 08:04 lipumpkin