No catching diamond job failures?
I'm getting the following error when running Checkm2 v1.0.1:
[04/05/2023 03:39:06 AM] INFO: Running CheckM2 version 1.0.1
[04/05/2023 03:39:06 AM] INFO: Running quality prediction workflow with 6 threads.
[04/05/2023 03:39:07 AM] INFO: Running in low-memory mode.
[04/05/2023 03:39:09 AM] INFO: Calling genes in 2 bins with 6 threads:
Finished processing 1 of 2 (50.00%) bins.
Finished processing 2 of 2 (100.00%) bins.
[04/05/2023 03:39:11 AM] INFO: Calculating metadata for 2 bins with 6 threads:
Finished processing 1 of 2 (50.00%) bin metadata.
Finished processing 2 of 2 (100.00%) bin metadata.
[04/05/2023 03:39:11 AM] INFO: Annotating input genomes with DIAMOND using 6 threads
[04/05/2023 03:39:53 AM] INFO: Processing DIAMOND output
[04/05/2023 03:39:53 AM] ERROR: No DIAMOND annotation was generated. Exiting
2023-04-05_03:39:54 Shutdown FusionFS v0.6.5-3f0d0c9
2023-04-05_03:39:54 Done
I'm guessing that the DIAMOND job failed, but the error was not caught due to using:
try:
cmd = "diamond blastp --outfmt 6 --max-target-seqs 1 " \
"--query {} " \
"-o {} " \
"--threads {} " \
"--db {} " \
"--query-cover {} " \
"--subject-cover {} " \
"--id {} " \
"--evalue {} --block-size {} "\
"--tmpdir {} --quiet "\
.format(temp_diamond_input.name,
diamond_output,
self.threads,
self.diamond_location,
DefaultValues.DIAMOND_QUERY_COVER,
DefaultValues.DIAMOND_SUBJECT_COVER,
DefaultValues.DIAMOND_PERCENT_ID,
DefaultValues.DIAMOND_EVALUE,
float(self.blocksize),
diamond_working_dir.name)
logging.debug(cmd)
subprocess.call(cmd, shell=True)
logging.debug('Finished Running DIAMOND')
except Exception as e:
logging.error('An error occured while running DIAMOND: {}'.format(e))
sys.exit(1)
Changing subprocess.call(cmd, shell=True) to subprocess.call(cmd, shell=True, check=True) should then catch the exception.
The code re-written by GPT4:
try:
cmd = "diamond blastp --outfmt 6 --max-target-seqs 1 " \
"--query {} " \
"-o {} " \
"--threads {} " \
"--db {} " \
"--query-cover {} " \
"--subject-cover {} " \
"--id {} " \
"--evalue {} --block-size {} "\
"--tmpdir {} --quiet "\
.format(temp_diamond_input.name,
diamond_output,
self.threads,
self.diamond_location,
DefaultValues.DIAMOND_QUERY_COVER,
DefaultValues.DIAMOND_SUBJECT_COVER,
DefaultValues.DIAMOND_PERCENT_ID,
DefaultValues.DIAMOND_EVALUE,
float(self.blocksize),
diamond_working_dir.name)
logging.debug(cmd)
result = subprocess.run(cmd, shell=True, stderr=subprocess.PIPE, encoding='utf-8', check=True)
logging.debug('Finished Running DIAMOND')
except subprocess.CalledProcessError as e:
logging.error('An error occured while running DIAMOND: {}'.format(e.stderr))
sys.exit(1)
finally:
diamond_working_dir.cleanup()
temp_diamond_input.close()
i update the diamond version to the latest(2.1.6),and then solved this error.
@Caiyulu-818 the problem was the version of diamond, and not the subprocess.run() code (above)?
@Caiyulu-818 the problem was the version of diamond, and not the
subprocess.run()code (above)?
I just update the diamond version to the latest, and the checkm2 testrun complete successfully,and i also use it for my own genome.
@Caiyulu-818 the problem was the version of diamond, and not the
subprocess.run()code (above)?I just update the diamond version to the latest, and the checkm2 testrun completed successfully, and i also use it for my own genome.
you can try it.
If DIAMOND fails (e.g., due to lack of memory), will the current python code catch the error? That was my main concern. The DIAMOND version that I am using works... if I provide enough memory, but I for some HTC jobs, I don't provide enough memory, and the DIAMOND job dies. It appears that the existing code doesn't deal with all scenarios in which the DIAMOND job dies.
Thanks for catching this. Next version will have more thorough DIAMOND error handling.
[sung.shin@arsnecla0ap2 checkm2_07172024]$ cat checkm2.log [07/17/2024 07:06:33 AM] INFO: Running CheckM2 version 1.0.1 [07/17/2024 07:06:33 AM] INFO: Custom database path provided for predict run. Checking database at /home/sung.shin/databases/CheckM2_database/uniref100.KO.1.dmnd... [07/17/2024 07:06:36 AM] INFO: Running quality prediction workflow with 30 threads. [07/17/2024 07:06:41 AM] INFO: Calling genes in 2704 bins with 30 threads: [07/17/2024 08:03:48 AM] INFO: Calculating metadata for 2704 bins with 30 threads: [07/17/2024 08:04:01 AM] INFO: Annotating input genomes with DIAMOND using 30 threads [07/17/2024 08:18:59 AM] INFO: Processing DIAMOND output [07/17/2024 08:18:59 AM] ERROR: No DIAMOND annotation was generated. Exiting
I used download/installed checkm2 v1.0.1 I also used singularity for checkm2 v1.0.1 and v1.0.2. However, all different versions gave me same error.
If it is DIAMOND issue, then could you tell me how to fix it? or update it?