Augustus icon indicating copy to clipboard operation
Augustus copied to clipboard

chromosomes longer than 2.1 GB lead to crash

Open MarioStanke opened this issue 1 year ago • 2 comments

Apparently, this is a result of 4Byte int not allowing for positions that are 2^31 or larger. Error message

examining piece 1..-1928087540 (-1928087540 bp)
terminate called after throwing an instance of 'std::bad_alloc'

MarioStanke avatar Aug 11 '22 10:08 MarioStanke

Apparently this is not completely solved, at least when predictions are requested on the complete chromosome in one run (rather than using --predictionStart and --predictionEnd)

$AUGUSTUS --species=rice --softmasking=0 --protein=on --codingseq=on --progress=true --gff3=on --alternatives-from-evidence=false --alternatives-from-sampling=false --extrinsicCfgFile=$EXCFFILE $GENOME_PART

leads to a segmentation fault after ~10k minutes compute time.

examining piece 2147286171..-2147481126 

MarioStanke avatar Aug 19 '22 09:08 MarioStanke

How about changing the type of beginPos, endPos, seqlen, restlen and the return value of getNextCutEndPoint from int to long in namgene.cc.

diff namgene.cc namgene.cc.org 
536,537c536,537
<   long endPos, beginPos;
<   long seqlen = strlen(dna);
---
>   int endPos, beginPos;
>   int seqlen = strlen(dna);
972,973c972,973
< long NAMGene::getNextCutEndPoint(const char *dna, long beginPos, int maxstep, SequenceFeatureCollection& sfc){
<   long restlen = strlen(dna+beginPos);
---
> int NAMGene::getNextCutEndPoint(const char *dna, int beginPos, int maxstep, SequenceFeatureCollection& sfc){
>   int restlen = strlen(dna+beginPos);

Using long would increase the memory requirements. I haven't encountered this error, so sorry if it doesn't work.

piroyon avatar Aug 20 '22 04:08 piroyon