viralVerify
viralVerify copied to clipboard
How to run this tool?
I'm working on an institute-wide pipeline for JCVI and had some trouble running your tool.
Here's my version installed via pip:
viral_verify --version
viral_verify, version 0.1.1
Here's my command:
viral_verify -i veba_output/binning/47-Drifterexpttime4punches_S40/tmp/unbinned.fasta -o veba_output/binning/47-Drifterexpttime4punches_S40/intermediate/viral_viralverify_output -H /usr/local/scratch/CORE/jespinoz/db/pfam/v33.1/Pfam-A.hmm -t 16
Edit: I had to decompress the PFAM database which was the error in the original post that I've edited since then.
Should I be using the PFAM database or the database from FigShare?
Can you update the Usage on your GitHub?
This is the results output:
veba_output/binning/47-Drifterexpttime4punches_S40/intermediate/viral_viralverify_output/
├── classified-fasta-output
│ ├── unbinned-chromosome.fasta
│ └── unbinned-unclassified.fasta
├── unbinned-circularized.fasta
├── unbinned-genes.fa
├── unbinned-hmmsearch.domtblout
├── unbinned-hmmsearch.output
├── unbinned-proteins-circularized.fa
├── unbinned-proteins.fa
└── unbinned-results.csv
I ran the version from GitHub on a differen tdataset and got the following output:
testing/viralverify_output/
├── oral_viruses_domtblout
├── oral_viruses_feature_table.txt
├── oral_viruses_genes.fa
├── oral_viruses_input_with_circ.fasta
├── oral_viruses_out_pfam
├── oral_viruses_prodigal.log
├── oral_viruses_proteins_circ.fa
├── oral_viruses_proteins.fa
├── oral_viruses_result_table.csv
├── Prediction_results_fasta
│ ├── oral_viruses_chromosome.fasta
│ ├── oral_viruses_plasmid.fasta
│ ├── oral_viruses_plasmid_uncertain.fasta
│ ├── oral_viruses_virus.fasta
│ └── oral_viruses_virus_uncertain.fasta
└── viralverify.log
1 directory, 15 files
How come the output is so different between the pip and GitHub versions?
That's funny - someone else forked this repo about a year ago, refactored and submitted to pypi as viral_verify (https://github.com/peterk87/viral_verify) . That's why the output and such is so different. Thanks for pointing that out!
Meanwhile, our current github version is awaiting approval for bioconda channel. As soon as that happens, I'll update accordingly.
That is so weird. They also took the namespace too?
What's the process like for getting something on bioconda?
That's open source, after all... Bioconda submission turned out to be pretty straightforward. Create recipe (yaml and build.sh files) with all metadata and dependencies, test and commit to bioconda recipes repository. https://bioconda.github.io/contributor/workflow.html Then, after all CI tests, it needs ti be reviewed by someone of bioconda members. No idea how long it takes :) github.com/bioconda/bioconda-recipes/pull/30186
I'm working on an institute-wide pipeline for JCVI and had some trouble running your tool.
Here's my version installed via pip:
viral_verify --version viral_verify, version 0.1.1
Here's my command:
viral_verify -i veba_output/binning/47-Drifterexpttime4punches_S40/tmp/unbinned.fasta -o veba_output/binning/47-Drifterexpttime4punches_S40/intermediate/viral_viralverify_output -H /usr/local/scratch/CORE/jespinoz/db/pfam/v33.1/Pfam-A.hmm -t 16
Edit: I had to decompress the PFAM database which was the error in the original post that I've edited since then.
Should I be using the PFAM database or the database from FigShare?
Can you update the Usage on your GitHub?
This is the results output:
veba_output/binning/47-Drifterexpttime4punches_S40/intermediate/viral_viralverify_output/ ├── classified-fasta-output │ ├── unbinned-chromosome.fasta │ └── unbinned-unclassified.fasta ├── unbinned-circularized.fasta ├── unbinned-genes.fa ├── unbinned-hmmsearch.domtblout ├── unbinned-hmmsearch.output ├── unbinned-proteins-circularized.fa ├── unbinned-proteins.fa └── unbinned-results.csv
I ran the version from GitHub on a differen tdataset and got the following output:
testing/viralverify_output/ ├── oral_viruses_domtblout ├── oral_viruses_feature_table.txt ├── oral_viruses_genes.fa ├── oral_viruses_input_with_circ.fasta ├── oral_viruses_out_pfam ├── oral_viruses_prodigal.log ├── oral_viruses_proteins_circ.fa ├── oral_viruses_proteins.fa ├── oral_viruses_result_table.csv ├── Prediction_results_fasta │ ├── oral_viruses_chromosome.fasta │ ├── oral_viruses_plasmid.fasta │ ├── oral_viruses_plasmid_uncertain.fasta │ ├── oral_viruses_virus.fasta │ └── oral_viruses_virus_uncertain.fasta └── viralverify.log 1 directory, 15 files
How come the output is so different between the pip and GitHub versions?
Hi there, so in the end which database did you use or is that one annotated out a bit more accurately.
I use geNomad now.
I use geNomad now.
All right, thanks.
Apologies @AndAvia , I wrote that from my phone but should have given more context. Here's the geNomad publication: https://www.nature.com/articles/s41587-023-01953-y and here's the GitHub: https://github.com/apcamargo/genomad
I developed a wrapper around geNomad for my "binning-viral" module (though, it doesn't really bin and more so identifies contigs that are viral) in my VEBA package. Here's the publication for VEBA (https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkae528/7697622) and here's the GitHub (https://github.com/jolepsin/veba).
If you only want to perform viral analysis, I would recommend just using geNomad because VEBA has a lot of functionality in other modules (e.g., assembly w/ SPAdes, rnaSPAdes, Fly or eukaryotic binning/gene modeling, etc) and requires more dependencies/databases.
@jolespin Thank you so much for your patience in replying! I only need to do virus identification at the moment, because there are so many virus identification software, I'm going to use genomad, VIBRANT, virfinder, deepvirfinder, virsorter, virsorter2, ViralVerify and these, but you said that ViralVerify two databases have different results, and I don't know which database to choose.I had a chance to look at your VEBA, and I found it very impressive! Wishing you a wonderful day!
Hi, Since this tool was released in 2021 and didn't receive significant updates, I'd also recommend for checking newer alternatives. If you are still interested to run exactly viralVerify, I'd definitely retrain the db with the updated pfam-a and genbank viral/plasmid/chromosomal sequences