FastANI
FastANI copied to clipboard
How to interpret n mapped fragments?
Hello
I have a question about the fastANI output E.g. Genome1 genome2 0.9 60 100
0.9 is the estimated ANI over the whole genome or only over the aligned fragments?
How can we interpret the ratio of mapped /all fragments? Does 60/ 100 mean the genomes overlap to 60 %?
IbI have e.g. 5 mapped from 100 can I trust the AI calculation ?
I work with MAGs may be I need to be more cautious. Thanks for the clarifictions
ANI is computed over the aligned (or conserved) fraction of genomes. That's how it's been defined in the early papers.
You're right, 60 out of 100 fragments in the query genome (Genome 2) have been mapped to Genome 1. FastANI has an internal threshold of minimum 50 fragments to avoid incorrect ANI estimation from just a few matching fragments.
ANI is computed over the aligned (or conserved) fraction of genomes. That's how it's been defined in the early papers.
You're right, 60 out of 100 fragments in the query genome (Genome 2) have been mapped to Genome 1. FastANI has an internal threshold of minimum 50 fragments to avoid incorrect ANI estimation from just a few matching fragments.
Sorry I have a very basic question on understanding how fastANI works. "FastANI has an internal threshold of minimum 50 fragments to avoid incorrect ANI estimation from just a few matching fragments." And by default --fragLen=3000, does this mean only when there are at least 50 fragments whose length >= 3000 bp, the ANI will be considered reliable?
For earlier versions of FastANI, what you said is true. Hope you are using the latest available FastANI version now.
Since version v1.3 or later (see https://github.com/ParBLiSS/FastANI/releases) , we have revised this criteria. With the new version, the help page fastani -h
would show you a --minFraction
parameter which tells that a minimum percentage sequence of two genomes must be shared b/w them for the ANI score to be considered reliable.
Hi, I have a question about how to rationally consider both the identity and coverage, when assigning my assembled genome to the reference database? Say, the examples are as follows:
genome1 genome2 0.9 60 100
genome1 genome3 0.8 90 100
which one is closer to my query genome1?
As far as I'm aware, candidates are typically ranked by just identity.
Could I know the ratio of mapped /all fragments in reference genome? I hope exclude genome much smaller than reference genome.
You can perhaps exchange values given to -q
and -r
.