sourmash icon indicating copy to clipboard operation
sourmash copied to clipboard

sourmash tax metagenome krona and kreports are vastly different?

Open RhettRautsaw opened this issue 7 months ago • 2 comments
trafficstars

I am running sourmash v4.8.14 on a PacBio HiFi long read dataset. It is a test dataset with ~50% representation from two cultured taxa.

I am using this snakemake workflow (https://github.com/PacificBiosciences/pb-metagenomics-tools/tree/master/Taxonomic-Profiling-Sourmash) which essentially is running:

sourmash tax metagenome -g {input.gather} -t {input.lineages} -o {params.out_base} \
                                --output-dir {params.outd} --output-format krona csv_summary kreport \
                                --rank species 2> {log}

The resulting krona and kreports are vastly different with the krona report suggesting ~53% unclassified and the kreport only suggesting ~4% unclassified (which I tend to believe more). However, I could just not quite understand how the krona report is generated. So just looking to shed some light on why these two reports are so different.

If I put the results into krona for visualization, the two files would look like this:

kreport Image

krona Image

RhettRautsaw avatar Mar 27 '25 20:03 RhettRautsaw

hi @RhettRautsaw sorry for delay - it is almost certainly due to the krona report not using abundances; see https://github.com/sourmash-bio/sourmash/issues/3577 for context. I will look into it!

cc @bluegenes

ctb avatar Apr 02 '25 17:04 ctb

Hi @ctb. No problem! Thanks for taking the time to look into this and providing some insight.

RhettRautsaw avatar Apr 02 '25 18:04 RhettRautsaw

This is being resolved in https://github.com/sourmash-bio/sourmash/pull/3711, which was a bit of a beast to make sense of. But I think I understand it all now and have both documented and tested it 🎉 !

In brief, krona format was not using abundances, while kreport was. As of v4.9.4, appropriate warnings will be printed, and --use-abund will drive consistent reporting across all formats.

You may also be interested in https://github.com/taxburst/taxburst if you're using krona, BTW :). #advertisement

ctb avatar Jul 31 '25 14:07 ctb

sourmash v4.9.4 has been released to a pypi and a conda-forge near you!

ctb avatar Aug 09 '25 15:08 ctb

Thanks @ctb!

RhettRautsaw avatar Oct 30 '25 01:10 RhettRautsaw