enrichM
enrichM copied to clipboard
super long run time for 2 genomes
Hello,
I have 2 genomes in fna, this is my script:
source activate enrichm_0.5.0 export ENRICHM_DB=/path_to_enrichm_db
enrichm annotate
--output /path_to_output
--genome_directory /path_to_genomes
--ko_hmm
--ec
--pfam
--orthologs
--threads 8
--log /path_to_out/LOG
conda deactivate
I submitted it to a server, requested 8 cores and 250 GB RAM. It was killed after 108 hours, because not enough wall-time: PBS: job killed: walltime 388897 exceeded limit 388800 (unit is minutes)
Can the program simply pick it up from where it left if I re-run the job?
The genomes are 2M in size, does this run time seem normal to you?
Many thanks!
Hi there,
Thanks again for the bug report. This run time isnt normal. I've also had other users experiencing this - I'll be looking into it soon so I'll keep you posted (busy time for me at the moment, sorry for the delays)
Thanks, Joel
Hi again,
I did another try with only one MAG, 1.8 M in size. I requested 12 cores and 250 GB RAM from a server, it was again killed after 108 hours. Not much info from the log file:
[2019-06-20 09:29:43 AM] INFO: Command: /path_to_env/Miniconda3/envs/enrichm_0.5.0/bin/enrichm annotate --output /path_to_file/08annotate_enrichm_AOA/out --genome_files /path_to_file/08annotate_enrichm_AOA/bin4.fna --ko_hmm --ec --pfam --orthologs --threads 12 --log /path_to_file/08annotate_enrichm_AOA/LOG [2019-06-20 09:29:43 AM] INFO: Running the annotate pipeline [2019-06-20 09:29:43 AM] INFO: Running pipeline: annotate [2019-06-20 09:29:43 AM] INFO: Setting up for genome annotation [2019-06-20 09:29:43 AM] INFO: Calling proteins for annotation [2019-06-20 09:29:43 AM] INFO: Preparing genomes for annotation [2019-06-20 09:29:43 AM] INFO: - Calling proteins for 1 genomes [2019-06-20 09:30:34 AM] INFO: Starting annotation: [2019-06-20 09:30:34 AM] INFO: - Annotating genomes with hypothetical clusters [2019-06-20 09:30:34 AM] INFO: - Generating MMSeqs2 database [2019-06-20 09:30:34 AM] INFO: - Clustering genome proteins
Please let me know how to fix it.
Cheers
Hi,
Just wondering if there's any update on this?
Cheers
Hi,
Sorry for hijacking the thread, but I am currently encountering something similar. From what I have seen this problem is caused by the mmseqs2 database not being generated properly. I managed to get through this step by hardcoding input faa en output db manually (line 398 in annotate.py in the libs), but ran into more problems downstream (with the genome_dicts).
Alternatively you could also skip this step and drop --orthologs from your input.
Cheers