Augustus
Augustus copied to clipboard
Augustus 3.3.3 intensive memory requirement
Hi! I am running Augustus/3.3.3 using a hintsfile with a large fasta query (155G) in our HPC system, and the program failed due to memory insufficiency even though I have requested 100G of RAM. I tested the program with the example.fa which worked well, and with a subset of my fasta query (with 150 sequences) which failed. I wonder if it is normal that Augustus require large amount of RAM and if yes, is there a way to get around with it (e.g. by multi-threading)? Sincere thanks, Peiwen
If you have large genome and hint input, you can reduce memory consumption at will by
- splitting the genome into chunks and
- reducing the hints to refer to each input chunk only.
Most of the time, it is sufficient to split the genome by complete sequences only. E.g. into files chr1.fa, chr2.fa, ... and then compiling hints hints-chr1.gff, hints-chr2. gff
If you have so large chromosomes that you want to split them up, you can use createAugustusJoblist.pl
, which splits chromosomes into overlapping chunks. The following command may be helpful, which is described in the book chapter
https://math-inf.uni-greifswald.de/storages/uni-greifswald/fakultaet/mnf/mathinf/stanke/augustus_wrp.pdf
createAugustusJoblist.pl --sequences=chr.lst --wrap="#" --overlap=100000 \
--chunksize=1100000 --outputdir=$augDir/ --joblist=jobs.lst
--jobprefix=$myPrefix_ --partitionHints --command "$augCall"
createAugustusJoblist.pl
parameters:
--sequences seqs.lst input sequences, format: each line contains one sequence including the full path and its size, e.g.
/cluster/data/panTro2/1/chr1.fa 1 229974691
/cluster/data/panTro2/1/chr1_random 1 9420409
/cluster/data/panTro2/2/chr2a 1 114460064
or
/cluster/data/panTro2/1/chr1_random /hints/chr1_random 1 9420409
/cluster/data/panTro2/2/chr2a /hints/chr2a 1 114460064
--outputdir s directory, in which later the AUGUSTUS output will be written.
--command s AUGUSTUS command, e.g. "augustus --species=human --maxDNAPieceSize=600000".
--joblist job.lst filename with list of jobs as given to parasol.
--chunksize n chunk size. Each sequence is (imaginarily) cut into chunks of this size
options:
--overlap n overlap. Neighboring chunks overlap by this number of bases.
--padding n padding on both sides (default 0).
--errordir errdir directory, in which later the AUGUSTUS error messages will be written.
--check insert parasol input/output checks.
--wrap=s have each job in a separate file, preceded by command s.
--jobprefix=s prefix of job name (default: "job.")
--partitionHints partition hints files according to genomic locus of single augustus runs,
add a command to the augustus job that will create and delete this hints file
in the output directory of the augustus job. This option also will automatically
delete empty error files of augustus.