ampliseq
ampliseq copied to clipboard
Implement translation filtering for protein coding markers
Description of feature
As mentioned in #449, filtering out ASVs with stops in translation can make a small but very accurate impact on ensuring correct ASVs. Adding this to the suite of post-dada2 filtering steps as a PCG alternative to barrnap and ITSx might be valuable.
I'm not proficient enough with nextflow but can offer python code if that would be useful, I currently use my own script for this.
Potentially other information such as structural information gleaned from the translation could also be used in filtering, my own experiments with this haven't been too successful but this has been used in some obscure cases
Thats a nice suggestion I think. Optimally we would use an existing biocontainer of a dedicated tool. Do you know any such tool? I guess not, otherwise you wouldn't have wrote your own script?! If such a tools doesn't exist, your script could come in handy!
To the best of my knowledge, no straightforward CLI tools like this has been used by anyone doing COX1 metabarcoding*. Probably because it's relatively easy to do in a GUI tool or with some quick code. So yes, hence why I wrote my own code for it. I'd be very happy to modify this code as needed to work inside nextflow, it would be some good experience in learning more nextflow for me. Probably this could work within a container of biopython, in a similar way to how you implement dada2? I have also thought of containerising metamate at some point, which would include this tool - if you'd prefer this route it's something I can look into doing.
*see the supplement of this recent paper
The PR #575 solves this issue that one can filter the ASVs for stop codons
Thanks @lokeshbio !
If I was too quick with closing that issue @tjcreedy (i.e. because it wasnt solved completely), please open it again!
Just had a brief look over the PR and it looks like it tackles all of these - thanks so much @lokeshbio and @d4straub. I'll put this on my todo list to test with my various COX1 testing datasets - I'm sure it'll work great!