enrichM super long run time for 2 genomes

Hello,

I have 2 genomes in fna, this is my script:

source activate enrichm_0.5.0 export ENRICHM_DB=/path_to_enrichm_db

enrichm annotate
--output /path_to_output
--genome_directory /path_to_genomes
--ko_hmm
--ec
--pfam
--orthologs
--threads 8
--log /path_to_out/LOG

conda deactivate

I submitted it to a server, requested 8 cores and 250 GB RAM. It was killed after 108 hours, because not enough wall-time: PBS: job killed: walltime 388897 exceeded limit 388800 (unit is minutes)

Can the program simply pick it up from where it left if I re-run the job?

The genomes are 2M in size, does this run time seem normal to you?

Many thanks!

Jun 19 '19 04:06 ganiatgithub

Hi there,

Thanks again for the bug report. This run time isnt normal. I've also had other users experiencing this - I'll be looking into it soon so I'll keep you posted (busy time for me at the moment, sorry for the delays)

Thanks, Joel

Jun 19 '19 05:06 geronimp

Hi again,

I did another try with only one MAG, 1.8 M in size. I requested 12 cores and 250 GB RAM from a server, it was again killed after 108 hours. Not much info from the log file:

[2019-06-20 09:29:43 AM] INFO: Command: /path_to_env/Miniconda3/envs/enrichm_0.5.0/bin/enrichm annotate --output /path_to_file/08annotate_enrichm_AOA/out --genome_files /path_to_file/08annotate_enrichm_AOA/bin4.fna --ko_hmm --ec --pfam --orthologs --threads 12 --log /path_to_file/08annotate_enrichm_AOA/LOG [2019-06-20 09:29:43 AM] INFO: Running the annotate pipeline [2019-06-20 09:29:43 AM] INFO: Running pipeline: annotate [2019-06-20 09:29:43 AM] INFO: Setting up for genome annotation [2019-06-20 09:29:43 AM] INFO: Calling proteins for annotation [2019-06-20 09:29:43 AM] INFO: Preparing genomes for annotation [2019-06-20 09:29:43 AM] INFO: - Calling proteins for 1 genomes [2019-06-20 09:30:34 AM] INFO: Starting annotation: [2019-06-20 09:30:34 AM] INFO: - Annotating genomes with hypothetical clusters [2019-06-20 09:30:34 AM] INFO: - Generating MMSeqs2 database [2019-06-20 09:30:34 AM] INFO: - Clustering genome proteins

Please let me know how to fix it.

Cheers

Jun 24 '19 22:06 ganiatgithub

Hi,

Just wondering if there's any update on this?

Cheers

Jul 26 '19 01:07 ganiatgithub

Hi,

Sorry for hijacking the thread, but I am currently encountering something similar. From what I have seen this problem is caused by the mmseqs2 database not being generated properly. I managed to get through this step by hardcoding input faa en output db manually (line 398 in annotate.py in the libs), but ran into more problems downstream (with the genome_dicts).

Alternatively you could also skip this step and drop --orthologs from your input.

Cheers

Aug 12 '19 10:08 WardDeb

enrichM enrichM copied to clipboard

super long run time for 2 genomes

enrichM
enrichM copied to clipboard