DRAM icon indicating copy to clipboard operation
DRAM copied to clipboard

Is there a way to "restart" DRAM after it stops after an error ?

Open francis29029 opened this issue 4 years ago • 9 comments

Hello,

I launch DRAM on 300 MAG and DRAM stops after about 200 MAG :-( (see error message below). Is there a way to "restart" DRAM from where he stops ? (if I re-launch the DRAM command it does not work)

I do not know if this is due to not enough memory (I set it up with 512 go) or if it's sth else. Any ideas ?

Thanks

FRancis

Traceback (most recent call last): File "/Software/python/Anaconda3-2020.11-DRAM/envs/DRAM/bin/DRAM.py", line 7, in exec(compile(f.read(), file, 'exec')) File "/Software/python/Anaconda3-2020.11-DRAM/DRAM/scripts/DRAM.py", line 146, in args.func(**args_dict) File "/Software/python/Anaconda3-2020.11-DRAM/DRAM/mag_annotator/annotate_bins.py", line 966, in annotate_bins_cmd annotate_bins(fasta_locs, output_dir, min_contig_size, prodigal_mode, trans_table, bit_score_threshold, File "/nihs/Software/python/Anaconda3-2020.11-DRAM/DRAM/mag_annotator/annotate_bins.py", line 1003, in annotate_bins all_annotations = annotate_fastas(fasta_locs, output_dir, db_locs, db_handler, min_contig_size, prodigal_mode, File "/Software/python/Anaconda3-2020.11-DRAM/DRAM/mag_annotator/annotate_bins.py", line 922, in annotate_fastas annotations_list.append(annotate_fasta(fasta_loc, fasta_name, fasta_dir, db_locs, db_handler, min_contig_size, File "/Software/python/Anaconda3-2020.11-DRAM/DRAM/mag_annotator/annotate_bins.py", line 823, in annotate_fasta annotations = annotate_orfs(gene_faa, db_locs, tmp_dir, start_time, db_handler, custom_db_locs, bit_score_threshold, File "/Software/python/Anaconda3-2020.11-DRAM/DRAM/mag_annotator/annotate_bins.py", line 734, in annotate_orfs annotation_list.append(run_hmmscan_kofam(gene_faa, db_locs['kofam'], tmp_dir, File "/Software/python/Anaconda3-2020.11-DRAM/DRAM/mag_annotator/annotate_bins.py", line 236, in run_hmmscan_kofam run_process(['hmmsearch', '--domtblout', output, '--cpu', str(threads), kofam_hmm, gene_faa], verbose=verbose) File "/Software/python/Anaconda3-2020.11-DRAM/DRAM/mag_annotator/utils.py", line 38, in run_process return subprocess.run(command, check=check, shell=shell, stdout=subprocess.PIPE, File "/Software/python/Anaconda3-2020.11-DRAM/envs/DRAM/lib/python3.9/subprocess.py", line 524, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['hmmsearch', '--domtblout', 'dram_annotate_full_restart_1024_hpc16/working_dir/MAG137/tmp/kofam_profile.b6', '--cpu', '10', '/Software/python/Anaconda3-2020.11-DRAM/DRAM_data_1/kofam_profiles.hmm', 'dram_annotate_full_restart_1024_hpc16/working_dir/MAG137/tmp/genes.faa']' died with <Signals.SIGSEGV: 11>.

francis29029 avatar Jan 07 '21 14:01 francis29029

Easiest way is to figure out which MAGs that DRAM completed for, move the output somewhere else, move the completed inputs somewhere else, then re-run DRAM on the MAGs that DRAM had not started/completed.

The final DRAM outputs are just a cat of the tRNA, rRNA and annotation.tsv files from each individual MAG

mw55309 avatar Feb 03 '21 11:02 mw55309

I think about making a snakemake that runs DRAM annotate on each mag separately.

SilasK avatar Feb 03 '21 16:02 SilasK

Yes, I have a Snakemake that breaks the input MAGs up into chunks (e.g. of 50 MAGs each) and runs DRAM on the chunks.

mw55309 avatar Feb 03 '21 17:02 mw55309

@mw55309 has it right. @francis29029 have it on my roadmap to be able to resume annotation since all the data is still there. Have you tried annotating the MAG where annotation breaks separately to see if you can reproduce the issue? It has come up a few times but I haven't been able to reproduce on my end.

shafferm avatar Feb 17 '21 21:02 shafferm

Hey @shafferm, You are back. We thought about using snakemake to implement the resume option to DRAM. This would also allow for distributing the workflow and speeding it up. e.g 100 genomes could be annotated in parallel. What do you think?

SilasK avatar Feb 18 '21 07:02 SilasK

I am! Sorry for the extended absence. I think using snakemake could be a good idea. Is your idea to run say 100 genomes at a time and then merge the annotations.tsv and other outputs at the end? It won't cause any issues with DRAM's functionality. You might take a bit of a speed hit because some processing will have to happen twice but it will probably be more than made up for by the fact that you will be multiplexing the hmmsearch's which don't seem to take advantage of all the processes they are given. It'll also be a big memory increase since multiple databases will need to be loaded at once but if you have the RAM then you might as well use it. DRAM is set up so that genomes are annotated separately and therefore the number of genomes you annotate at once will not affect the annotations assigned.

shafferm avatar Feb 18 '21 18:02 shafferm

I made a snakemake for DRAM. However, what I do now is running DRAM on one genome and then manually merging it.

Is there functionality in DRAM that could do the merging for me? So that it would be more robust for changes.

@shafferm You said you did already some changes for the integration into Kbase?

Obviouly I could adapt the snakemake and place it in this repo if you would like to.

SilasK avatar May 22 '21 14:05 SilasK

@shafferm Is there no way to specify multiple genomes but not all in a folder?

SilasK avatar Sep 11 '21 23:09 SilasK

Looks like the answer to this is still yes and no, but mostly no. The -i/--input_faa argument passes its value to the python glob function. So if you can match what you want to run with a glob friendly regex, you can run any subset you want. This is a known limitation, but because it is usually easy, if not wise, to move or rename files that should be processed separately, it is not a priority.

rmFlynn avatar Sep 13 '21 22:09 rmFlynn