ampliseq icon indicating copy to clipboard operation
ampliseq copied to clipboard

Implement translation filtering for protein coding markers

Open tjcreedy opened this issue 2 years ago • 2 comments

Description of feature

As mentioned in #449, filtering out ASVs with stops in translation can make a small but very accurate impact on ensuring correct ASVs. Adding this to the suite of post-dada2 filtering steps as a PCG alternative to barrnap and ITSx might be valuable.

I'm not proficient enough with nextflow but can offer python code if that would be useful, I currently use my own script for this.

Potentially other information such as structural information gleaned from the translation could also be used in filtering, my own experiments with this haven't been too successful but this has been used in some obscure cases

tjcreedy avatar Jun 09 '22 11:06 tjcreedy

Thats a nice suggestion I think. Optimally we would use an existing biocontainer of a dedicated tool. Do you know any such tool? I guess not, otherwise you wouldn't have wrote your own script?! If such a tools doesn't exist, your script could come in handy!

d4straub avatar Jun 13 '22 14:06 d4straub

To the best of my knowledge, no straightforward CLI tools like this has been used by anyone doing COX1 metabarcoding*. Probably because it's relatively easy to do in a GUI tool or with some quick code. So yes, hence why I wrote my own code for it. I'd be very happy to modify this code as needed to work inside nextflow, it would be some good experience in learning more nextflow for me. Probably this could work within a container of biopython, in a similar way to how you implement dada2? I have also thought of containerising metamate at some point, which would include this tool - if you'd prefer this route it's something I can look into doing.

*see the supplement of this recent paper

tjcreedy avatar Jun 13 '22 15:06 tjcreedy

The PR #575 solves this issue that one can filter the ASVs for stop codons

lokeshbio avatar May 16 '23 13:05 lokeshbio

Thanks @lokeshbio !

d4straub avatar May 17 '23 08:05 d4straub

If I was too quick with closing that issue @tjcreedy (i.e. because it wasnt solved completely), please open it again!

d4straub avatar May 17 '23 08:05 d4straub

Just had a brief look over the PR and it looks like it tackles all of these - thanks so much @lokeshbio and @d4straub. I'll put this on my todo list to test with my various COX1 testing datasets - I'm sure it'll work great!

tjcreedy avatar May 17 '23 08:05 tjcreedy