bioperl-live icon indicating copy to clipboard operation
bioperl-live copied to clipboard

Bio::DB::Fasta errors on or skips seqs with ID '0'

Open cjfields opened this issue 8 years ago • 2 comments

See this link (may need to join MAKER list):

https://groups.google.com/d/topic/maker-devel/l3HmeRfgKjU/discussion

Letter reproduced below. The error seems to be a seq ID evaluating as FALSE.

Dear Maker,

I did run maker for the annotation of a denovo genome assembly. It works great for all scaffolds but 1.

Here is the error I get:

WARNING: Cannot find >0, trying to re-index the fasta.
stop here:0
ERROR: Fasta index error
 at /scratch/beegfs/monthly/vrossier/SOFT/MAKER2/maker_mpich/bin/../lib/GI.pm line 1622.
        GI::polish_exonerate('FastaChunk=HASH(0x5fd22b8)', 'FastaSeq=HASH(0x4607540)', 211829, '>scaffold52256_cov106', 'ARRAY(0x62a2240)', 'ARRAY(0x629f198)', '/scratch/beegfs/monthly/vrossier/temp/aoc_denovo_maker/Aocell...', 'a', '/software/SequenceAnalysis/SequenceAlignment/exonerate/2.2.0/...', ...) called at /scratch/beegfs/monthly/vrossier/SOFT/MAKER2/maker_mpich/bin/../lib/Process/MpiChunk.pm line 2491
        Process::MpiChunk::__ANON__() called at /scratch/beegfs/monthly/vrossier/SOFT/MAKER2/maker_mpich/bin/../lib/Error.pm line 415
        eval {...} called at /scratch/beegfs/monthly/vrossier/SOFT/MAKER2/maker_mpich/bin/../lib/Error.pm line 407
        Error::subs::try('CODE(0x45d40a8)', 'HASH(0x5fe4930)') called at /scratch/beegfs/monthly/vrossier/SOFT/MAKER2/maker_mpich/bin/../lib/Process/MpiChunk.pm line 4224
        Process::MpiChunk::_go('Process::MpiChunk=HASH(0x45ddfb0)', 'run', 'HASH(0x4e3c540)', 6, 3) called at /scratch/beegfs/monthly/vrossier/SOFT/MAKER2/maker_mpich/bin/../lib/Process/MpiChunk.pm line 341
        Process::MpiChunk::run('Process::MpiChunk=HASH(0x45ddfb0)', 16) called at /scratch/beegfs/monthly/vrossier/SOFT/MAKER2/maker_mpich/bin/maker line 979
--> rank=16, hostname=dee-serv04.vital-it.ch
ERROR: Failed while polishing alt-ESTs
ERROR: Chunk failed at level:6, tier_type:3
FAILED CONTIG:scaffold52256_cov106

ERROR: Chunk failed at level:4, tier_type:0
FAILED CONTIG:scaffold52256_cov106

Any thought how to solve this?

Thanks in advance, Victor Rossier

cjfields avatar Jul 07 '16 02:07 cjfields

We need to trace which versions of MAKER are being used in case this is an internal issue there, but my feeling is the problem is in BioPerl. The trace doesn't make it obvious where it's failing though.

From Victor:

Sure,

MAKER version 2.31.8

cheers,
Victor
...

Victor,

Can you post which version of MAKER you are using?

chris

cjfields avatar Jul 07 '16 15:07 cjfields

This has been whittled down to a simple logic issue with the below statement (and illustrates the problem w/ stringified overloading).

    while (my $seq = $stream->next_seq) {
        $count++;
    }

When overloading is in effect and uses the display_id(), this fails with a sequence of ID 0 b/c the statement will evaluate to false. The only way to get around this is to either shut off stringification overloading, either permanently in Bio::PrimarySeq::Fasta or by returning the raw (non-stringified) object in next_seq(). As this would very likely break downstream tools like MAKER i'm deferring for now, though it's worth testing this at some point. Of note, only $stream->next_seq() is affected, the sequence is still indexed and thus present and retrievable.

The temp workaround is the below, though it's probably a good idea to never name your sequences a literal 0.

    while (defined(my $seq = $stream->next_seq)) {
        $count++;
    }

cjfields avatar Sep 07 '16 03:09 cjfields