FastANI icon indicating copy to clipboard operation
FastANI copied to clipboard

How to interpret n mapped fragments?

Open SilasK opened this issue 5 years ago • 7 comments

Hello

I have a question about the fastANI output E.g. Genome1 genome2 0.9 60 100

0.9 is the estimated ANI over the whole genome or only over the aligned fragments?

How can we interpret the ratio of mapped /all fragments? Does 60/ 100 mean the genomes overlap to 60 %?

IbI have e.g. 5 mapped from 100 can I trust the AI calculation ?

I work with MAGs may be I need to be more cautious. Thanks for the clarifictions

SilasK avatar May 15 '19 20:05 SilasK

ANI is computed over the aligned (or conserved) fraction of genomes. That's how it's been defined in the early papers.

You're right, 60 out of 100 fragments in the query genome (Genome 2) have been mapped to Genome 1. FastANI has an internal threshold of minimum 50 fragments to avoid incorrect ANI estimation from just a few matching fragments.

cjain7 avatar Jun 06 '19 03:06 cjain7

ANI is computed over the aligned (or conserved) fraction of genomes. That's how it's been defined in the early papers.

You're right, 60 out of 100 fragments in the query genome (Genome 2) have been mapped to Genome 1. FastANI has an internal threshold of minimum 50 fragments to avoid incorrect ANI estimation from just a few matching fragments.

Sorry I have a very basic question on understanding how fastANI works. "FastANI has an internal threshold of minimum 50 fragments to avoid incorrect ANI estimation from just a few matching fragments." And by default --fragLen=3000, does this mean only when there are at least 50 fragments whose length >= 3000 bp, the ANI will be considered reliable?

limin321 avatar Jul 31 '20 23:07 limin321

For earlier versions of FastANI, what you said is true. Hope you are using the latest available FastANI version now. Since version v1.3 or later (see https://github.com/ParBLiSS/FastANI/releases) , we have revised this criteria. With the new version, the help page fastani -h would show you a --minFraction parameter which tells that a minimum percentage sequence of two genomes must be shared b/w them for the ANI score to be considered reliable.

cjain7 avatar Aug 02 '20 22:08 cjain7

Hi, I have a question about how to rationally consider both the identity and coverage, when assigning my assembled genome to the reference database? Say, the examples are as follows:

genome1   genome2   0.9   60   100
genome1   genome3   0.8   90   100

which one is closer to my query genome1?

ZhangDengwei avatar Aug 20 '20 02:08 ZhangDengwei

As far as I'm aware, candidates are typically ranked by just identity.

cjain7 avatar Aug 20 '20 02:08 cjain7

Could I know the ratio of mapped /all fragments in reference genome? I hope exclude genome much smaller than reference genome.

chen1i6c04 avatar Sep 08 '20 00:09 chen1i6c04

You can perhaps exchange values given to -q and -r.

cjain7 avatar Sep 08 '20 06:09 cjain7