Augustus icon indicating copy to clipboard operation
Augustus copied to clipboard

Augustus 3.3.3 intensive memory requirement

Open PeiwenLi opened this issue 3 years ago • 1 comments

Hi! I am running Augustus/3.3.3 using a hintsfile with a large fasta query (155G) in our HPC system, and the program failed due to memory insufficiency even though I have requested 100G of RAM. I tested the program with the example.fa which worked well, and with a subset of my fasta query (with 150 sequences) which failed. I wonder if it is normal that Augustus require large amount of RAM and if yes, is there a way to get around with it (e.g. by multi-threading)? Sincere thanks, Peiwen

PeiwenLi avatar Aug 17 '20 00:08 PeiwenLi

If you have large genome and hint input, you can reduce memory consumption at will by

  • splitting the genome into chunks and
  • reducing the hints to refer to each input chunk only.

Most of the time, it is sufficient to split the genome by complete sequences only. E.g. into files chr1.fa, chr2.fa, ... and then compiling hints hints-chr1.gff, hints-chr2. gff

If you have so large chromosomes that you want to split them up, you can use createAugustusJoblist.pl, which splits chromosomes into overlapping chunks. The following command may be helpful, which is described in the book chapter https://math-inf.uni-greifswald.de/storages/uni-greifswald/fakultaet/mnf/mathinf/stanke/augustus_wrp.pdf

createAugustusJoblist.pl --sequences=chr.lst --wrap="#" --overlap=100000 \
--chunksize=1100000 --outputdir=$augDir/ --joblist=jobs.lst 
--jobprefix=$myPrefix_ --partitionHints --command "$augCall"
createAugustusJoblist.pl

parameters:
--sequences seqs.lst input sequences, format: each line contains one sequence including the full path and its size, e.g.
                     /cluster/data/panTro2/1/chr1.fa    1       229974691
                     /cluster/data/panTro2/1/chr1_random        1       9420409
                     /cluster/data/panTro2/2/chr2a      1       114460064
                     or
                     /cluster/data/panTro2/1/chr1_random        /hints/chr1_random      1       9420409
                     /cluster/data/panTro2/2/chr2a      /hints/chr2a    1       114460064
--outputdir s        directory, in which later the AUGUSTUS output will be written.
--command s          AUGUSTUS command, e.g. "augustus --species=human --maxDNAPieceSize=600000".
--joblist job.lst    filename with list of jobs as given to parasol.
--chunksize n        chunk size. Each sequence is (imaginarily) cut into chunks of this size

options:
--overlap n          overlap. Neighboring chunks overlap by this number of bases.
--padding n          padding on both sides (default 0).
--errordir errdir    directory, in which later the AUGUSTUS error messages will be written.
--check              insert parasol input/output checks.
--wrap=s             have each job in a separate file, preceded by command s.
--jobprefix=s        prefix of job name (default: "job.")
--partitionHints     partition hints files according to genomic locus of single augustus runs,
                     add a command to the augustus job that will create and delete this hints file
                     in the output directory of the augustus job. This option also will automatically
                     delete empty error files of augustus.

MarioStanke avatar Aug 17 '20 17:08 MarioStanke