ERROR: DIAMOND finished abnormally - but no info to help me fix the problem
Hi Bastiaan,
I had a large contig file with many host sequences, which I removed using bbtools sketch, followed by parsing in R to generate a contig fasta (using the package Biostrings).
I then tried to run CAT, with the same script that we have been using except for changing the name of the input file and the location (but not the names) of the output files. This is dying with a cryptic message during the Diamond step and I was wondering if you could offer some advice.
I am running in a slurm environment: #SBATCH --cpus-per-task=16 #SBATCH --mem=64G
module load cat module load diamond/0.9.34-python-3.6.5
Set variables
DIR=./assembly/contig_dictionary mkdir -p ./assembly/contig_dictionary/darkCAT
CAT contigs -c $DIR/dark_viral_dictionary.fasta -o $DIR/darkCAT/out.CAT
--force
--sensitive
-d /scratch/ref/cat/CAT_prepare_20200618/2020-06-18_CAT_database
-t /scratch/ref/cat/CAT_prepare_20200618/2020-06-18_taxonomy;
Here is the log file contents
CAT v5.0.4.
CAT is running. Protein prediction, alignment, and contig classification are carried out. Rarw!
Supplied command: /opt/htcf/spack/opt/spack/linux-ubuntu16.04-x86_64/gcc-5.4.0/cat-5.0.4-yjd7dkz4inaobretcvqtudyeeqvn73h2/bin/CAT contigs -c ./assembly/cont ig_dictionary/dark_viral_dictionary.fasta -o ./assembly/contig_dictionary/darkCAT/out.CAT --force --sensitive -d /scratch/ref/cat/CAT_prepare_20200618/2020- 06-18_CAT_database -t /scratch/ref/cat/CAT_prepare_20200618/2020-06-18_taxonomy
Contigs fasta: ./assembly/contig_dictionary/dark_viral_dictionary.fasta Taxonomy folder: /scratch/ref/cat/CAT_prepare_20200618/2020-06-18_taxonomy/ Database folder: /scratch/ref/cat/CAT_prepare_20200618/2020-06-18_CAT_database/ Parameter r: 10 Parameter f: 0.5 Log file: ./assembly/contig_dictionary/darkCAT/out.CAT.log
Doing some pre-flight checks first. [2020-12-22 18:20:06.618114] Prodigal found: Prodigal V2.6.3: February, 2016. [2020-12-22 18:20:06.656205] DIAMOND found: diamond version 0.9.34. Ready to fly!
[2020-12-22 18:20:06.666554] Importing contig names from ./assembly/contig_dictionary/dark_viral_dictionary.fasta. [2020-12-22 18:20:06.986500] Running Prodigal for ORF prediction. Files ./assembly/contig_dictionary/darkCAT/out.CAT.predicted_proteins.faa and ./assembly/c ontig_dictionary/darkCAT/out.CAT.predicted_proteins.gff will be generated. Do not forget to cite Prodigal when using CAT or BAT in your publication! [2020-12-22 18:20:13.737901] ORF prediction done! [2020-12-22 18:20:13.741936] Parsing ORF file ./assembly/contig_dictionary/darkCAT/out.CAT.predicted_proteins.faa [2020-12-22 18:20:13.758769] Homology search with DIAMOND is starting. Please be patient. Do not forget to cite DIAMOND when using CAT or BAT in your public ation! query: ./assembly/contig_dictionary/darkCAT/out.CAT.predicted_proteins.faa database: /scratch/ref/cat/CAT_prepare_20200618/2020-06-18_CAT_database/2020-06-18.nr.dmnd mode: sensitive number of cores: 48 block-size (billions of letters): 2.0 index-chunks: 4 tmpdir: ./assembly/contig_dictionary/darkCAT top: 50 [2020-12-22 16:49:30.329757] ERROR: DIAMOND finished abnormally.
Your log does not show the Diamond console output. Please see here on how to activate this: https://github.com/dutilh/CAT/issues/37
Here is the log you asked for: [2020-12-23 09:15:03.933246] ERROR: DIAMOND finished abnormally.
CAT v5.0.4.
CAT is running. Protein prediction, alignment, and contig classification are carried out. Rarw!
Supplied command: /opt/htcf/spack/opt/spack/linux-ubuntu16.04-x86_64/gcc-5.4.0/cat-5.0.4-yjd7dkz4inaobretcvqtudyeeqvn73h2/bin/CAT contigs -c ./assembly/cont ig_dictionary/dark_viral_dictionary.fasta -o ./assembly/contig_dictionary/darkCAT/out.CAT --force --sensitive -d /scratch/ref/cat/CAT_prepare_20200618/2020- 06-18_CAT_database -t /scratch/ref/cat/CAT_prepare_20200618/2020-06-18_taxonomy
Contigs fasta: ./assembly/contig_dictionary/dark_viral_dictionary.fasta Taxonomy folder: /scratch/ref/cat/CAT_prepare_20200618/2020-06-18_taxonomy/ Database folder: /scratch/ref/cat/CAT_prepare_20200618/2020-06-18_CAT_database/ Parameter r: 10 Parameter f: 0.5 Log file: ./assembly/contig_dictionary/darkCAT/out.CAT.log
Doing some pre-flight checks first. [2020-12-23 07:24:17.491876] Prodigal found: Prodigal V2.6.3: February, 2016. [2020-12-23 07:24:17.525900] DIAMOND found: diamond version 0.9.34. Ready to fly!
[2020-12-23 07:24:17.545762] Importing contig names from ./assembly/contig_dictionary/dark_viral_dictionary.fasta. [2020-12-23 07:24:17.806534] Running Prodigal for ORF prediction. Files ./assembly/contig_dictionary/darkCAT/out.CAT.predicted_proteins.faa and ./assembly/c ontig_dictionary/darkCAT/out.CAT.predicted_proteins.gff will be generated. Do not forget to cite Prodigal when using CAT or BAT in your publication! [2020-12-23 07:24:27.631373] ORF prediction done! [2020-12-23 07:24:27.636896] Parsing ORF file ./assembly/contig_dictionary/darkCAT/out.CAT.predicted_proteins.faa [2020-12-23 07:24:27.658530] Homology search with DIAMOND is starting. Please be patient. Do not forget to cite DIAMOND when using CAT or BAT in your public ation! query: ./assembly/contig_dictionary/darkCAT/out.CAT.predicted_proteins.faa database: /scratch/ref/cat/CAT_prepare_20200618/2020-06-18_CAT_database/2020-06-18.nr.dmnd mode: sensitive number of cores: 8 block-size (billions of letters): 2.0 index-chunks: 4 tmpdir: ./assembly/contig_dictionary/darkCAT top: 50
This is still missing the console output from Diamond, which I need to get an idea why it's failing. Please try to find this or ask at the CAT repo for help if necessary.