Augustus
Augustus copied to clipboard
GBProcessor::getGeneList(): Could not read the following line in Genbank file.
@KatharinaHoff, Hi, I'am using augustus etraining training a very large genome, approxiate ~10Gb. my training set is select from transdecoder result. my gene model is very long. longer than 1Mbp. I got fllowing error:
GBProcessor::getGeneList(): Could not read the following line in Genbank file.
gt ccacctataa taatcatatc ttatttaaaa atcatatgtt
Maximum line length is
10000.
Encountered error after reading 2455 annotations.
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
is there any limitation for etraining. when I use it to big genome or long gene models?
i have the same problem....
In the current code we have the default
genbank.hh:#define GBMAXLINELEN 40000
If a single line in a Genbank file has more characters than that then you must use line breaks.
Genbank format is human readable and therefore lines are broken, usually after about 100 characters. Alternatively, you can up the 40000 in your code and recompile.
@MarioStanke
Hi, I have the same error when I used etraining. So i change 40000 to 500000 in the genbank.hh :#define GBMAXLINELEN 40000 and recompile, but the error arose again:
GBProcessor::getGeneList(): Could not read the following line in Genbank file. tgct ccagtttcag acaaaccata Maximum line length is 499998.
I check the genbank file. A line length is 60 bp in the gb file and each of the sequences' length is not more than 500000. So this way doesn't work. Do you have any other suggestions? Looking forward to your reply.
Please double check whether none of the lines is longer that the maximum. If so increase the max or simply introduce line breaks in the file. Usually GenBank files have limited line lengths.
The lines are breaked and the max line length is 81 in the genbank file.
The sequences length is much big in my data and I increase the max line limit to 1000000. The error is the same.
So i am not sure of the cause of the error.
I met the same issue, when I use augustus (version=v3.4.0), I trun the Maximum line length to 400000 but I still can't fix my problem.
I wonder if there problem in the script itself...
I am also encountering the same error. The genbank file in which the error is occurring is an intermediate file created by the BRAKER. It has regular line breaks every 60 bp. If I cut out the head command up to this error line, it is exactly 2 GB. Our genbank file is 4GB. Thank you.
GBProcessor::getGeneList(): Could not read the following line in Genbank file.
tcaaaatttt tacacaaata caaaaaagct aggttaaagc aacaaggata tattaacact
Maximum line length is
39998.
grep -n 'tcaaaatttt tacacaaata caaaaaagct aggttaaagc aacaaggata tattaacact' tmp_opt_Sp_1/curtrain-6
28289942: 1081 tcaaaatttt tacacaaata caaaaaagct aggttaaagc aacaaggata tattaacact
head -n 28289942 tmp_opt_Sp_1/curtrain-6 > tmpgb
ls -lh tmpgb
rw-r--r-- 1 xxxxxx xxxxxx 2.0G 8月 15 18:20 tmpgb
Finally, we found a solution.
diff genbank.cc genbank.cc.org
677c677
< long fposb, fpose;
---
> int fposb, fpose;
The maximum value of an int type in C++ is 2,147,483,647 so it can't find a position in the file that is more than that.
Sorry, I just noticed a hint here. https://github.com/Gaius-Augustus/Augustus/issues/353
Good job. I'll try your method. Thanks.
The error message that included the maximal line length was also shown when the (implicit) maximal input file size was exceeded. This was about 2.1 Gb on many machines. I have used Hiroyos solution after checking that it works on files with more 2^31 Bytes. Please let me know if you see this error after checking out the new version (master branch).