emg-viral-pipeline
emg-viral-pipeline copied to clipboard
[Feature Request] Conda Distribution for Taxonomic Identification
Hi and hope all is well!
I wanted to inquire about the conda installation timeline. I know that PPR-Meta is not conda-installable which is why you need to withhold the conda implementation. I wanted to ask if meanwhile the taxonomic identification module could be implemented as a separate conda package.
I think the first step in viral contig identification can be mostly up to personal preference with a wide array of tools out there and more developing (e.g. geNomad, Phamer/PhaBox, etc), but your taxonomic identification protocol is quite unique, useful, and the tools for viral taxonomic annotation are rare!
I was wondering since the taxonomic identification doesn't rely on PPR-meta if a conda recipe can be generated for it separately. I'm currently working on building a custom viral calling pipeline for our lab, and have been struggling to find good taxonomic identification protocols that are conda installable and can be easily integrated into Snakemake.
Thanks for the great tool!
Best,
Erfan
Hey @erfanshekarriz
I think something like that is already possible, but I have not tested it for a while. You can use the --only annotate
(https://github.com/EBI-Metagenomics/emg-viral-pipeline/blob/master/virify.nf#L755C7-L755C19) parameter to skip virus prediction and only run the annotation module on all your contigs in your input FASTA (only a length filter applies).
Via that, I think you can even use the -profile conda
.
We might have to bump some tool versions in the conda env files to match the current tool versions in the Docker containers but that would be possible.
Does that help?
I think generating a conda package to cover the whole taxonomic classification part of VIRify is beyond the scope of the pipeline and it's implementation. But I also get your point about separating virus contig prediction and taxonomy annotation!