bioperl-live
bioperl-live copied to clipboard
Bio::DB::Fasta errors on or skips seqs with ID '0'
See this link (may need to join MAKER list):
https://groups.google.com/d/topic/maker-devel/l3HmeRfgKjU/discussion
Letter reproduced below. The error seems to be a seq ID evaluating as FALSE.
Dear Maker,
I did run maker for the annotation of a denovo genome assembly. It works great for all scaffolds but 1.
Here is the error I get:
WARNING: Cannot find >0, trying to re-index the fasta.
stop here:0
ERROR: Fasta index error
at /scratch/beegfs/monthly/vrossier/SOFT/MAKER2/maker_mpich/bin/../lib/GI.pm line 1622.
GI::polish_exonerate('FastaChunk=HASH(0x5fd22b8)', 'FastaSeq=HASH(0x4607540)', 211829, '>scaffold52256_cov106', 'ARRAY(0x62a2240)', 'ARRAY(0x629f198)', '/scratch/beegfs/monthly/vrossier/temp/aoc_denovo_maker/Aocell...', 'a', '/software/SequenceAnalysis/SequenceAlignment/exonerate/2.2.0/...', ...) called at /scratch/beegfs/monthly/vrossier/SOFT/MAKER2/maker_mpich/bin/../lib/Process/MpiChunk.pm line 2491
Process::MpiChunk::__ANON__() called at /scratch/beegfs/monthly/vrossier/SOFT/MAKER2/maker_mpich/bin/../lib/Error.pm line 415
eval {...} called at /scratch/beegfs/monthly/vrossier/SOFT/MAKER2/maker_mpich/bin/../lib/Error.pm line 407
Error::subs::try('CODE(0x45d40a8)', 'HASH(0x5fe4930)') called at /scratch/beegfs/monthly/vrossier/SOFT/MAKER2/maker_mpich/bin/../lib/Process/MpiChunk.pm line 4224
Process::MpiChunk::_go('Process::MpiChunk=HASH(0x45ddfb0)', 'run', 'HASH(0x4e3c540)', 6, 3) called at /scratch/beegfs/monthly/vrossier/SOFT/MAKER2/maker_mpich/bin/../lib/Process/MpiChunk.pm line 341
Process::MpiChunk::run('Process::MpiChunk=HASH(0x45ddfb0)', 16) called at /scratch/beegfs/monthly/vrossier/SOFT/MAKER2/maker_mpich/bin/maker line 979
--> rank=16, hostname=dee-serv04.vital-it.ch
ERROR: Failed while polishing alt-ESTs
ERROR: Chunk failed at level:6, tier_type:3
FAILED CONTIG:scaffold52256_cov106
ERROR: Chunk failed at level:4, tier_type:0
FAILED CONTIG:scaffold52256_cov106
Any thought how to solve this?
Thanks in advance, Victor Rossier
We need to trace which versions of MAKER are being used in case this is an internal issue there, but my feeling is the problem is in BioPerl. The trace doesn't make it obvious where it's failing though.
From Victor:
Sure,
MAKER version 2.31.8
cheers,
Victor
...
Victor,
Can you post which version of MAKER you are using?
chris
This has been whittled down to a simple logic issue with the below statement (and illustrates the problem w/ stringified overloading).
while (my $seq = $stream->next_seq) {
$count++;
}
When overloading is in effect and uses the display_id()
, this fails with a sequence of ID 0 b/c the statement will evaluate to false. The only way to get around this is to either shut off stringification overloading, either permanently in Bio::PrimarySeq::Fasta
or by returning the raw (non-stringified) object in next_seq()
. As this would very likely break downstream tools like MAKER i'm deferring for now, though it's worth testing this at some point. Of note, only $stream->next_seq()
is affected, the sequence is still indexed and thus present and retrievable.
The temp workaround is the below, though it's probably a good idea to never name your sequences a literal 0.
while (defined(my $seq = $stream->next_seq)) {
$count++;
}