viralVerify icon indicating copy to clipboard operation
viralVerify copied to clipboard

How to run this tool?

Open jolespin opened this issue 3 years ago • 9 comments

I'm working on an institute-wide pipeline for JCVI and had some trouble running your tool.

Here's my version installed via pip:

 viral_verify --version
viral_verify, version 0.1.1

Here's my command:

viral_verify -i veba_output/binning/47-Drifterexpttime4punches_S40/tmp/unbinned.fasta -o veba_output/binning/47-Drifterexpttime4punches_S40/intermediate/viral_viralverify_output -H /usr/local/scratch/CORE/jespinoz/db/pfam/v33.1/Pfam-A.hmm -t 16

Edit: I had to decompress the PFAM database which was the error in the original post that I've edited since then.

Should I be using the PFAM database or the database from FigShare?

Can you update the Usage on your GitHub?

This is the results output:

veba_output/binning/47-Drifterexpttime4punches_S40/intermediate/viral_viralverify_output/
├── classified-fasta-output
│   ├── unbinned-chromosome.fasta
│   └── unbinned-unclassified.fasta
├── unbinned-circularized.fasta
├── unbinned-genes.fa
├── unbinned-hmmsearch.domtblout
├── unbinned-hmmsearch.output
├── unbinned-proteins-circularized.fa
├── unbinned-proteins.fa
└── unbinned-results.csv

I ran the version from GitHub on a differen tdataset and got the following output:

testing/viralverify_output/
├── oral_viruses_domtblout
├── oral_viruses_feature_table.txt
├── oral_viruses_genes.fa
├── oral_viruses_input_with_circ.fasta
├── oral_viruses_out_pfam
├── oral_viruses_prodigal.log
├── oral_viruses_proteins_circ.fa
├── oral_viruses_proteins.fa
├── oral_viruses_result_table.csv
├── Prediction_results_fasta
│   ├── oral_viruses_chromosome.fasta
│   ├── oral_viruses_plasmid.fasta
│   ├── oral_viruses_plasmid_uncertain.fasta
│   ├── oral_viruses_virus.fasta
│   └── oral_viruses_virus_uncertain.fasta
└── viralverify.log

1 directory, 15 files

How come the output is so different between the pip and GitHub versions?

jolespin avatar Aug 11 '21 02:08 jolespin

That's funny - someone else forked this repo about a year ago, refactored and submitted to pypi as viral_verify (https://github.com/peterk87/viral_verify) . That's why the output and such is so different. Thanks for pointing that out!

Meanwhile, our current github version is awaiting approval for bioconda channel. As soon as that happens, I'll update accordingly.

mikeraiko avatar Aug 22 '21 20:08 mikeraiko

That is so weird. They also took the namespace too?

What's the process like for getting something on bioconda?

jolespin avatar Aug 22 '21 21:08 jolespin

That's open source, after all... Bioconda submission turned out to be pretty straightforward. Create recipe (yaml and build.sh files) with all metadata and dependencies, test and commit to bioconda recipes repository. https://bioconda.github.io/contributor/workflow.html Then, after all CI tests, it needs ti be reviewed by someone of bioconda members. No idea how long it takes :) github.com/bioconda/bioconda-recipes/pull/30186

mikeraiko avatar Aug 22 '21 21:08 mikeraiko

I'm working on an institute-wide pipeline for JCVI and had some trouble running your tool.

Here's my version installed via pip:

 viral_verify --version
viral_verify, version 0.1.1

Here's my command:

viral_verify -i veba_output/binning/47-Drifterexpttime4punches_S40/tmp/unbinned.fasta -o veba_output/binning/47-Drifterexpttime4punches_S40/intermediate/viral_viralverify_output -H /usr/local/scratch/CORE/jespinoz/db/pfam/v33.1/Pfam-A.hmm -t 16

Edit: I had to decompress the PFAM database which was the error in the original post that I've edited since then.

Should I be using the PFAM database or the database from FigShare?

Can you update the Usage on your GitHub?

This is the results output:

veba_output/binning/47-Drifterexpttime4punches_S40/intermediate/viral_viralverify_output/
├── classified-fasta-output
│   ├── unbinned-chromosome.fasta
│   └── unbinned-unclassified.fasta
├── unbinned-circularized.fasta
├── unbinned-genes.fa
├── unbinned-hmmsearch.domtblout
├── unbinned-hmmsearch.output
├── unbinned-proteins-circularized.fa
├── unbinned-proteins.fa
└── unbinned-results.csv

I ran the version from GitHub on a differen tdataset and got the following output:

testing/viralverify_output/
├── oral_viruses_domtblout
├── oral_viruses_feature_table.txt
├── oral_viruses_genes.fa
├── oral_viruses_input_with_circ.fasta
├── oral_viruses_out_pfam
├── oral_viruses_prodigal.log
├── oral_viruses_proteins_circ.fa
├── oral_viruses_proteins.fa
├── oral_viruses_result_table.csv
├── Prediction_results_fasta
│   ├── oral_viruses_chromosome.fasta
│   ├── oral_viruses_plasmid.fasta
│   ├── oral_viruses_plasmid_uncertain.fasta
│   ├── oral_viruses_virus.fasta
│   └── oral_viruses_virus_uncertain.fasta
└── viralverify.log

1 directory, 15 files

How come the output is so different between the pip and GitHub versions?

Hi there, so in the end which database did you use or is that one annotated out a bit more accurately.

AndAvia avatar Jul 23 '24 01:07 AndAvia

I use geNomad now.

jolespin avatar Jul 23 '24 05:07 jolespin

I use geNomad now.

All right, thanks.

AndAvia avatar Jul 23 '24 05:07 AndAvia

Apologies @AndAvia , I wrote that from my phone but should have given more context. Here's the geNomad publication: https://www.nature.com/articles/s41587-023-01953-y and here's the GitHub: https://github.com/apcamargo/genomad

I developed a wrapper around geNomad for my "binning-viral" module (though, it doesn't really bin and more so identifies contigs that are viral) in my VEBA package. Here's the publication for VEBA (https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkae528/7697622) and here's the GitHub (https://github.com/jolepsin/veba).

If you only want to perform viral analysis, I would recommend just using geNomad because VEBA has a lot of functionality in other modules (e.g., assembly w/ SPAdes, rnaSPAdes, Fly or eukaryotic binning/gene modeling, etc) and requires more dependencies/databases.

jolespin avatar Jul 23 '24 17:07 jolespin

@jolespin Thank you so much for your patience in replying! I only need to do virus identification at the moment, because there are so many virus identification software, I'm going to use genomad, VIBRANT, virfinder, deepvirfinder, virsorter, virsorter2, ViralVerify and these, but you said that ViralVerify two databases have different results, and I don't know which database to choose.I had a chance to look at your VEBA, and I found it very impressive! Wishing you a wonderful day!

AndAvia avatar Jul 24 '24 10:07 AndAvia

Hi, Since this tool was released in 2021 and didn't receive significant updates, I'd also recommend for checking newer alternatives. If you are still interested to run exactly viralVerify, I'd definitely retrain the db with the updated pfam-a and genbank viral/plasmid/chromosomal sequences

Dmitry-Antipov avatar Aug 12 '24 18:08 Dmitry-Antipov