ampliseq icon indicating copy to clipboard operation
ampliseq copied to clipboard

Cut ASVs for taxonomy assignment

Open erikrikarddaniel opened this issue 4 years ago • 5 comments

In some cases, a user might have sequenced an amplicon that is longer than the sequences in the database one wants to use. For this to work, ASV sequences could be cut before taxonomy assignment.

erikrikarddaniel avatar Feb 23 '21 10:02 erikrikarddaniel

Interesting, do you have an example when that would be the case? To illustrate the problem. With what parameters would you cut the ASV sequences? Degenerated nucleotide sequence? Nucleotide positions?

d4straub avatar Feb 23 '21 10:02 d4straub

People are apparently sequencing whole rRNA operons, but most databases are limited to a single gene, or ITS, per sequence. To assign taxonomy, one would hence have to cut down the ASV to what's in a particular database. The alternative would be to trust that the kmer distribution is the same, but I don't think this would be good.

erikrikarddaniel avatar Feb 24 '21 16:02 erikrikarddaniel

We have sequenced more or less the whole rRNA operon in fungi, but as (most of) UNITE only contain the ITS region we need to cut the resulting ASVs and use only the ITS (or even ITS2) region for the taxonomy assignment. For this we use ITSx (https://microbiology.se/software/itsx/), which can be used both for fungi and other phyla. Would it be an option to include this as an optional step, e.g. with a parameter --cut_its?

jtangrot avatar May 03 '21 09:05 jtangrot

I suppose we were thinking of something general, and this sounds specific to ITS. OTOH, better to have something that works for the only use case I'm aware of than nothing, so, in my opinion, go ahead and add.

erikrikarddaniel avatar May 03 '21 10:05 erikrikarddaniel

So is this solved?

d4straub avatar May 28 '21 11:05 d4straub