VirSorter2
VirSorter2 copied to clipboard
Inconsistency: identical sequence not always identified as phage
Hi,
Thank you for creating this tool, it works really great.
I have been using it on a dataset of genomes assembled with both Illumina and MinION reads. It found a prophage in a contig corresponding to a plasmid (according to MOB-suite) of genome 1. The prophage has ok metrics: score of 0.883 and 1 hallmark gene. If I follow the tutorial of Sullivan Lab, checkV found 27 genes in this region, including 3 viral and 2 host genes, and it trimmed this prophage a little more (final length: 17,614bp). The metrics after the second run of Virsorter2 are: score of 0.927 and still 1 hallmark gene.
The same (very similar) plasmid was found in 2 other of my genomes (genomes 2 and 3), but no phage were identified there, which surprised me. I use the prophage identified in genome 1 and blasted in on genomes 2 and 3. I got hits showing 100% identity on the entire length of the phage. I am thus very confuse why the exact same sequence was identified as a phage in genome 1 but no in genomes 2 and 3. Do you have any hint on that? My problem is quite similar to issue #143, but the size of the contigs are similar so the reason of this inconsistency must be different here.
Regarding the contigs (hence plasmids), the one of genome 1 is 88,043 bp long and the prophage was found at positions 63,963-81,576. From 81,594-88,043 is a transposon. In genomes 2 and 3, the plasmids are 81,588 and 91,396bp long, respectively. They match at almost 100% the plasmid of genome 1 from 1 to 81,595 (the transposon is missing). So the genetic contexts are very similar. As linearization of the contigs was not done at the same position, I expect the phage of genome 2 to be located at 11,201-28,814 and the one of genome 3 at 11,797-29,410.
I tried running VirSorter2 changing the config file to (in case a slight difference could have result to being under a threshold):
virsorter config --set PROVIRUS_MIN_PEAK_PROBA=0.6
virsorter config --set PROVIRUS_MIN_HALLMARK_CNT=0
It didn't help.
I also tried extracting the sequence from genomes 2 and 3 that I expect to be a phage according to the blast search, and I gave this sequence directly to checkV, and then I ran VirSorter2 (vs2-pass2 in the tutorial), which did lead to an output (with score of 0.927 and 1 hallmark gene).
Do you have any idea how come Virsorter2 is not able to identify these sequences as phage in genomes 2 and 3 when I give as input the whole genome, or even just the contig of interest, but can find it when I give it only the sequence? And mostly how come it found it genome 1?
Sincerely,
Héloïse