pb-metagenomics-tools
pb-metagenomics-tools copied to clipboard
Error in rule Checkm2Database
Name the workflow HiFi-MAG-Pipeline
Describe the bug Error in rule Checkm2Database
Expected behavior Expected the pipeline to run as normal
Screenshots [Tue Oct 24 10:29:41 2023] localrule Checkm2Database: input: /data0/hifi/new_hifi/pb-metagenomics-tools-2.1.0/HiFi-MAG-Pipeline/inputs/revio_all.contigs.fasta output: /data0/hifi/new_hifi/pb-metagenomics-tools-2.1.0/HiFi-MAG-Pipeline/CheckM2_database/uniref100.KO.1.dmnd, /data0/hifi/new_hifi/pb-metagenomics-tools-2.1.0/HiFi-MAG-Pipeline/CheckM2_database/CheckM2.complete.txt log: /data0/hifi/new_hifi/pb-metagenomics-tools-2.1.0/HiFi-MAG-Pipeline/logs/Checkm2Database.log jobid: 15 benchmark: /data0/hifi/new_hifi/pb-metagenomics-tools-2.1.0/HiFi-MAG-Pipeline/benchmarks/Checkm2Database.tsv reason: Missing output files: /data0/hifi/new_hifi/pb-metagenomics-tools-2.1.0/HiFi-MAG-Pipeline/CheckM2_database/uniref100.KO.1.dmnd resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/38b2454ccc60e533a4b4041ae242f4cc [Tue Oct 24 10:29:43 2023] Error in rule Checkm2Database: jobid: 15 output: /data0/hifi/new_hifi/pb-metagenomics-tools-2.1.0/HiFi-MAG-Pipeline/CheckM2_database/uniref100.KO.1.dmnd, /data0/hifi/new_hifi/pb-metagenomics-tools-2.1.0/HiFi-MAG-Pipeline/CheckM2_database/CheckM2.complete.txt log: /data0/hifi/new_hifi/pb-metagenomics-tools-2.1.0/HiFi-MAG-Pipeline/logs/Checkm2Database.log (check log file(s) for error message) conda-env: /data0/hifi/new_hifi/pb-metagenomics-tools-2.1.0/HiFi-MAG-Pipeline/.snakemake/conda/38b2454ccc60e533a4b4041ae242f4cc shell: checkm2 database --download --path /data0/hifi/new_hifi/pb-metagenomics-tools-2.1.0/HiFi-MAG-Pipeline &> /data0/hifi/new_hifi/pb-metagenomics-tools-2.1.0/HiFi-MAG-Pipeline/logs/Checkm2Database.log && touch /data0/hifi/new_hifi/pb-metagenomics-tools-2.1.0/HiFi-MAG-Pipeline/CheckM2_database/CheckM2.complete.txt (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2023-10-24T102939.524220.snakemake.log
Log files
Traceback (most recent call last):
File "/data0/hifi/new_hifi/pb-metagenomics-tools-2.1.0/HiFi-MAG-Pipeline/.snakemake/conda/38b2454ccc60e533a4b4041ae242f4cc/bin/checkm2", line 27, in
I am having the same issue as mentioned above.
Traceback (most recent call last):
File "/data1/snakemake/pb-metagenomics-tools/HiFi-MAG-Pipeline/.snakemake/conda/5ecc75c830d4c67a2636691686d458e0_/bin/checkm2", line 27, in <module>
from checkm2 import predictQuality
File "/data1/snakemake/pb-metagenomics-tools/HiFi-MAG-Pipeline/.snakemake/conda/5ecc75c830d4c67a2636691686d458e0_/lib/python3.6/site-packages/checkm2/predictQuality.py", line 1, in <module>
from checkm2 import modelProcessing
File "/data1/snakemake/pb-metagenomics-tools/HiFi-MAG-Pipeline/.snakemake/conda/5ecc75c830d4c67a2636691686d458e0_/lib/python3.6/site-packages/checkm2/modelProcessing.py", line 17, in <module>
from tensorflow import keras
File "/data1/snakemake/pb-metagenomics-tools/HiFi-MAG-Pipeline/.snakemake/conda/5ecc75c830d4c67a2636691686d458e0_/lib/python3.6/site-packages/tensorflow/__init__.py", line 41, in <module>
from tensorflow.python.tools import module_util as _module_util
File "/data1/snakemake/pb-metagenomics-tools/HiFi-MAG-Pipeline/.snakemake/conda/5ecc75c830d4c67a2636691686d458e0_/lib/python3.6/site-packages/tensorflow/python/__init__.py", line 41, in <module>
from tensorflow.python.eager import context
File "/data1/snakemake/pb-metagenomics-tools/HiFi-MAG-Pipeline/.snakemake/conda/5ecc75c830d4c67a2636691686d458e0_/lib/python3.6/site-packages/tensorflow/python/eager/context.py", line 28, in <module>
from absl import logging
File "/data1/snakemake/pb-metagenomics-tools/HiFi-MAG-Pipeline/.snakemake/conda/5ecc75c830d4c67a2636691686d458e0_/lib/python3.6/site-packages/absl/logging/__init__.py", line 97, in <module>
from absl import flags
File "/data1/snakemake/pb-metagenomics-tools/HiFi-MAG-Pipeline/.snakemake/conda/5ecc75c830d4c67a2636691686d458e0_/lib/python3.6/site-packages/absl/flags/__init__.py", line 35, in <module>
from absl.flags import _argument_parser
File "/data1/snakemake/pb-metagenomics-tools/HiFi-MAG-Pipeline/.snakemake/conda/5ecc75c830d4c67a2636691686d458e0_/lib/python3.6/site-packages/absl/flags/_argument_parser.py", line 82, in <module>
class ArgumentParser(Generic[_T], metaclass=_ArgumentParserCache):
TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases
So, the environments will fail to install on our HPC with strict channel priorities (which is why I love Docker/Nextflow so much these days ...). I think incompatible packages are installed leading to this error.
I was able to workaround this by editing the check2m.yml and defining packages that are compatible per the checkm2 yml on their github and specifying python 3.8 ... I still had to disable strict priorities for it to resolve:
GNU nano 4.8 envs/checkm2.yml name: checkm2_env channels:
- bioconda
- conda-forge
- defaults dependencies:
- checkm2 == 1.0.1
- python == 3.8
- scikit-learn=0.23.2
- h5py=2.10.0
- numpy=1.19.2
- diamond=2.0.4
- tensorflow >= 2.2.0, <2.6.0
- lightgbm=3.2.1
- pandas=1.4.0
- scipy=1.8.0
- prodigal=2.6.3
- setuptools
- requests
- packaging
- tqdm
Hi @Rafa-Seong , @nallsing-salk , and @MicroSeq , Thanks for your patience. This might be related to an issue with CheckM2 and specifically the Zenodo API. See thread here: https://github.com/chklovski/CheckM2/issues/83
It may or may not have been resolved.
I would recommend removing the existing conda environment and trying to re-install. Please let me know if this issue persists.
I would prefer to keep the conda recipe as simple as possible, as pinning versions may work for some systems but not others.
I had the same issue. @MicroSeq 's solution worked for me but I guess it's not preferable in the long run. Maybe making it possible to download the CheckM2 db manually and point to it in config as you have for GTDB could be a feature worth adding in time?
I had the same issue. @MicroSeq 's solution worked for me but I guess it's not preferable in the long run. Maybe making it possible to download the CheckM2 db manually and point to it in config as you have for GTDB could be a feature worth adding in time?
This would be the best solution as many HPC systems are configured without internet access on the nodes.
Update: I removed the Checkm2Database rule and modified Checkm2ContigAnalysis to take a pre-downloaded DIAMOND database, then ran with the original checkm2.yaml conda env (i.e. not the workaround described by @MicroSeq ) and got the same error as @nallsing-salk and @Rafa-Seong so I think this means its an issue with something in the checkm2 conda env and not the Zenodo API?
Branch available here if anyone wants to confirm they get the same error.
I am unable to reproduce errors related to the checkm2 conda env, but keep me posted on any progress or continued challenges here.
Downloading the checkm2 database manually before beginning the workflow is now required, fixed in #79.