serratus icon indicating copy to clipboard operation
serratus copied to clipboard

rdrp5?

Open dhoconno opened this issue 11 months ago • 1 comments

Hi there,

I began exploring serratus-lite to profile RNA viruses in wastewater datasets this week using the RNA Virus RdRP Search workflow described at: https://github.com/ababaian/serratus/wiki/Serratus-Lite

When I looked at the reference files described at https://github.com/ababaian/serratus/wiki/Access-Data-Release, I noticed there is an rdrp5.fa file in addition to the rdrp1.fa file. It seems like it is non-redundant with rdrp1 at a quick glance. Are there any downsides to concatenating the rdrp1.fa and rdrp5.fa files and using those as the reference for DIAMOND? Also, I wasn't able to find any information on the content of the rdrp5.fa -- is this information available somewhere?

Also, I'm not sure if it would be helpful, I but I also had a bit of a challenge getting the psummarizer.py script to run correctly under python2. With the help of an LLM, I refactored it into python 3 and it seems to be running correctly; the output is sane and matches my expectations of the data and the underlying .pro file as best I can tell. I don't want to create a PR for it because I have no way of ensuring that it is 1:1 identical to what you see with the existing python2 code (since I can't get that to give output in my environment). Attaching it here if you want to take a look at it. No worries if not.

Thanks,

dave

serratus_psummarizer.py.zip

dhoconno avatar Jan 24 '25 16:01 dhoconno

Hi @dhoconno Dhoconno,

Sorry about the delay in response.

rdrp5 is in fact non-redundant with rdrp1. You should be able to concatenate them without too much problems, the sanity check will be to align rdrp1 to rdrp5 and see that nothing is giving high identity hits to one another (>90% aa)

Rdrp5 is a large collection of RdRp which covers much more biodiversity, but it also will have more false positives. It's kind of like a "high sensitivity" mode for divergent viruses, and it's an ongoing development on how we can better refine this.

In short, rdrp5 is the output of palmscan2 when analyzing the RDVA (https://github.com/ababaian/serratus/wiki/RNA-Deep-Virome-Assemblage) dataset. Hope that clarifies it.

ababaian avatar Jan 29 '25 17:01 ababaian