MMseqs2 icon indicating copy to clipboard operation
MMseqs2 copied to clipboard

easy-taxonomy's species suggestion not confirmed by alignment or mash

Open xfengnefx opened this issue 4 years ago • 0 comments

Expected Behavior

If an input contig was annotated at species level (in the lca.tsv file) by easy-taxonomy, it would align or show similarity to an existing assembly of that species. It did not and I am not sure how to interprete.

Current Behavior & Context

I assembled a quite unique library. Some circular contigs got bad scores from checkM. I checked with mmseqs2's easy-taxonomy (default parameters, Uniprot90) and found 3 contigs being classified at species level. They all had reference assemblies at refseq/genbank. Therefore I download them and aligned my contigs (minimap2 -c -xasm20), where one did not align and two hardly had hits. Running mash dist (default parameters) confimed with this result.

I checked GC skew, which visually suggested that 1 might have misassembly (two peaks; checkM gave 100% completeness and 100% contamination), but the other 2 looked ok (one peak; not contaminated, but also <70% complete).

The Question

I wonder if this could be interpreted as the contigs are wrong & just happened to land at species level annotation in mmseqs2, or not necessarily so? What else can I check, or it's hard to draw a conclusion? Thank you!

Your Environment

mmseqs2: 45111b641859ed0ddd875b94d6fd1aef1a675b7e, statically-compiled I'm on centOS/ubuntu server, the run didn't yield warnings.

edit: minor typo

xfengnefx avatar Jul 20 '21 15:07 xfengnefx