Bracken
Bracken copied to clipboard
Impact of genome size
Hi, I am wondering whether Bracken factors the size of bacterial genome into the calculation of relative abundance. If having a defined consortium consisting of a few of known bacteria with a large variation in genome size, the relative abundance of each strain in the consortium will be very different with or without their genome size counted in. I also see the difficulty in estimating genome size of detected strains in complex microbiome such as gut microbiome, it might be challenging to that. Can you help clarify how Bracken deals with genome size? Thanks.
I came here to ask the exact same question! Looking forward to read @jenniferlu717's comments.
What I understand is the bracken output is the relative abundance not of species, but of reads aligning to species, and that these values are not normalised by genome length.
Genome size is not included in the abundance estimation.
It becomes more difficult when doing species or genus abundance as the genome sizes within those clades may be varied. Normalization would be biased.
that makes sense, thanks for taking the time
Hi. I would like to understand more why if the genome sizes within a species is varied, normalization would be biased. Is it not exactly because the genome size is varied that we need to do normalization based on the genome size?
If I understand correctly @sentausa, what @jenniferlu717 meant is that within a genus, species will have various genome lengths. If the read abundance is only distributed down to genus by Bracken, then knowing which genome size to normalise with becomes tricky. Same goes for species abundance, because strains may have different genome size, but I wouldn't have thought that it would vary enough within a species to make a difference. Please correct me if I'm wrong !