bioperl-run icon indicating copy to clipboard operation
bioperl-run copied to clipboard

'cpgplot' input sequence not recognized

Open piatekmj opened this issue 8 years ago • 9 comments

Session info: perl = v5.22.0 Mac OS = 10.11.6 BioPerl-Run-1.007001

From Bio::Factory::EMBOSS when trying to provide input sequence for cpgplot as a variable, I do get an error message saying: Died: ajResourceinNewDrcat 'AGGT ... (abbreviated for clarity) ... GGAG' failed to find drcat This seems to be composition dependent as I do not get that for every sequence provided. It seems this error only shows up for sequences that do have CpG island identified.

Also occasionally I do get error message saying: Uncaught exception: Formatting Overflow, raised at ajfmt.c:1309

The above only applies for some cases not for all, I only get those messages for some sequences.

When I, however provide the input sequence with asis prefix like so: sequence => "asis:$sequence" program completes with output.

As long as we are on cpgplot, I would like to also mention that with plot => 'Y' provided, I do get another error message for sequences for which CpG islands have been identified:

*** PLPLOT ERROR ***
Unable to either (1) open/find or (2) allocate memory for the font file
Program aborted

This can be turned off by providing plot => 'N' if one is not interested in graphical output. I do have plplot installed with my package manager.

Any assistance would be appreciated.

piatekmj avatar Feb 24 '17 19:02 piatekmj

Marek,

Please show us some code, then we can take a look.

But I am noting that you said that when you use “asis:” your script works, presumably you are providing a string when you use “asis", like:

-sequencea => "asis:$sequence_str"

If you do not provide a string then your script should supply a Sequence object, like:

-sequencea => $seq_object

This “asis” business is built in to EMBOSS, so you can run a command this way from the command line:

water -asequence=asis:gatc -bsequence=asis:gatc

Otherwise, the EMBOSS applications expect file names, not strings, at the command line.

Brian O.

On Feb 24, 2017, at 2:15 PM, piatekmj [email protected] wrote:

Session info: perl = v5.22.0 Mac OS = 10.11.6 BioPerl-Run-1.007001

From Bio::Factory::EMBOSS when trying to provide input sequence for cpgplot as a variable, I do get an error message saying: Died: ajResourceinNewDrcat 'AGGT ... (abbreviated for clarity) ... GGAG' failed to find drcat This seems to be composition dependent as I do not get that for every sequence provided. It seems this error only shows up for sequences that do have CpG island identified.

Also occasionally I do get error message saying: Uncaught exception: Formatting Overflow, raised at ajfmt.c:1309

The above only applies for some cases not for all, I only get those messages for some sequences.

When I, however provide the input sequence with asis prefix like so: sequence => "asis:$sequence" program completes with output.

As long as we are on cpgplot, I would like to also mention that with plot => 'Y' provided, I do get another error message for sequences for which CpG islands have been identified:

*** PLPLOT ERROR *** Unable to either (1) open/find or (2) allocate memory for the font file Program aborted This can be turned off by providing plot => 'N' if one is not interested in graphical output. I do have plplot installed with my package manager.

Any assistance would be appreciated.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bioperl/bioperl-run/issues/47, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQ99FI0NeyTblkRXdAYEvrMafQg9f3sks5rfyw2gaJpZM4MLkmS.

bosborne avatar Feb 25 '17 22:02 bosborne

Hi Brian,

The code I'm testing is as follows:

use Bio::Factory::EMBOSS;
use Bio::SeqIO;

my $genome = 'genome.fa';
my $inseq = Bio::SeqIO->new(    
    -file   => $genome,          
    -format => 'fasta',
    );

my %assembly;

while (my $seq = $inseq->next_seq) {
    $assembly{$seq->id} = $seq->seq;
}

my $factory = new Bio::Factory::EMBOSS;
my $prog = $factory -> program('cpgplot');

foreach my $frag (keys %assembly){

    print "Processing $frag\n";
    my %input = (
	-sequence => "asis:$assembly{$frag}",
	-outfile => "${frag}.cpgplot",
	-window => 100,
	-minlen => 200,
	-minoe => 0.6,
	-minpc => 50,
	-plot => 'N',
    );

    print "$frag\tCheck 1\t";
    $prog->run(\%input);
    print "Check 2\n";

}

The above produces output that is different than what I obtain from any EMBOSS webserver using the same settings. The output file from webserver contains CpG islands for every fasta entry in my 5MB input file containing 12 fasta sequences, while my script outputs results only for 6 of them.

I have to have the -plot => 'N', switch otherwise I get errors related to PLPLOT.

With -sequence => $assembly{$frag}, I get some failures reported: Died: ajResourceinNewDrcat 'AGGT ... (abbreviated for clarity) ... GGAG' failed to find drcat or Check 1 Uncaught exception: Formatting Overflow, raised at ajfmt.c:1309 Check 2

With the -sequence => "asis:$assembly{$frag}", I get no errors but results differ from a webserver even when used with the same settings.

The concerns I have are regarding 1) the errors/warnings related to PLPLOT; 2) why is there a difference between -sequence => "asis:$assembly{$frag}" and just -sequence => "$assembly{$frag}" where the variable holds just a DNA sequence 3) the output being different between a webserver and local script using the same settings using webserver1 or webserver2 (EBI's cpgplot doesn't accept files larger than 1MB). (I'm not able to check what version of EMBOSS is being used by any of these two)

Could there be something wrong with the installation maybe?

Thanks, Marek

Additional session info: print $factory->version(); # prints 6.5.7.0 print $factory->location(); # prints local

piatekmj avatar Feb 27 '17 14:02 piatekmj

Marek,

I don’t have your input file but I suspect that using a long string with returns in it is not working, as “-sequence". Try using Sequence objects, rather than strings:

use Bio::Factory::EMBOSS; use Bio::SeqIO;

my $genome = 'genome.fa'; my $inseq = Bio::SeqIO->new( -file => $genome, -format => 'fasta', );

my %assembly;

while (my $seq = $inseq->next_seq) { $assembly{$seq->id} = $seq; }

my $factory = new Bio::Factory::EMBOSS; my $prog = $factory -> program('cpgplot');

foreach my $frag (keys %assembly){

print "Processing $frag\n";
my %input = (
-sequence => $assembly{$frag},
-outfile => "${frag}.cpgplot",
-window => 100,
-minlen => 200,
-minoe => 0.6,
-minpc => 50,
-plot => 'N',
);

print "$frag\tCheck 1\t";
$prog->run(\%input);
print "Check 2\n";

}

Brian O.

On Feb 27, 2017, at 9:56 AM, piatekmj [email protected] wrote:

Hi Brian,

The code I'm testing is as follows:

use Bio::Factory::EMBOSS; use Bio::SeqIO;

my $genome = 'genome.fa'; my $inseq = Bio::SeqIO->new(
-file => $genome,
-format => 'fasta', );

my %assembly;

while (my $seq = $inseq->next_seq) { $assembly{$seq->id} = $seq->seq; }

my $factory = new Bio::Factory::EMBOSS; my $prog = $factory -> program('cpgplot');

foreach my $frag (keys %assembly){

print "Processing $frag\n";
my %input = (

-sequence => "asis:$assembly{$frag}", -outfile => "${frag}.cpgplot", -window => 100, -minlen => 200, -minoe => 0.6, -minpc => 50, -plot => 'N', );

print "$frag\tCheck 1\t";
$prog->run(\%input);
print "Check 2\n";

}

The above produces output that is different than what I obtain from any EMBOSS webserver using the same settings. The output file from webserver http://www.bioinformatics.nl/cgi-bin/emboss/cpgplot contains CpG islands for every fasta entry in my 5MB input file containing 12 fasta sequences, while my script outputs results only for 6 of them.

I have to have the -plot => 'N', switch otherwise I get errors related to PLPLOT.

With -sequence => $assembly{$frag}, I get some failures reported: Died: ajResourceinNewDrcat 'AGGT ... (abbreviated for clarity) ... GGAG' failed to find drcat or Check 1 Uncaught exception: Formatting Overflow, raised at ajfmt.c:1309 Check 2

With the -sequence => "asis:$assembly{$frag}", I get no errors but results differ from a webserver even when used with the same settings.

The concerns I have are regarding 1) the errors/warnings related to PLPLOT; 2) why is there a difference between -sequence => "asis:$assembly{$frag}" and just -sequence => "$assembly{$frag}" where the variable holds just a DNA sequence 3) the output being different between a webserver and local script using the same settings using webserver1 http://www.bioinformatics.nl/cgi-bin/emboss/cpgplot or webserver2 http://www.ebi.ac.uk/Tools/seqstats/emboss_cpgplot/ (EBI's cpgplot doesn't accept files larger than 1MB). (I'm not able to check what version of EMBOSS is being used by any of these two)

Could there be something wrong with the installation maybe?

Thanks, Marek

Additional session info: print $factory->version(); # prints 6.5.7.0 print $factory->location(); # prints local

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bioperl/bioperl-run/issues/47#issuecomment-282742864, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQ99Bqc9Uh6NYpUW6iHX5mML-xp38yrks5rguQDgaJpZM4MLkmS.

bosborne avatar Feb 27 '17 15:02 bosborne

Marek,

OK, I see you tried using Sequence objects. How long are the sequences in “genome.fa”?

Brian O.

On Feb 27, 2017, at 9:56 AM, piatekmj [email protected] wrote:

Hi Brian,

The code I'm testing is as follows:

use Bio::Factory::EMBOSS; use Bio::SeqIO;

my $genome = 'genome.fa'; my $inseq = Bio::SeqIO->new(
-file => $genome,
-format => 'fasta', );

my %assembly;

while (my $seq = $inseq->next_seq) { $assembly{$seq->id} = $seq->seq; }

my $factory = new Bio::Factory::EMBOSS; my $prog = $factory -> program('cpgplot');

foreach my $frag (keys %assembly){

print "Processing $frag\n";
my %input = (

-sequence => "asis:$assembly{$frag}", -outfile => "${frag}.cpgplot", -window => 100, -minlen => 200, -minoe => 0.6, -minpc => 50, -plot => 'N', );

print "$frag\tCheck 1\t";
$prog->run(\%input);
print "Check 2\n";

}

The above produces output that is different than what I obtain from any EMBOSS webserver using the same settings. The output file from webserver http://www.bioinformatics.nl/cgi-bin/emboss/cpgplot contains CpG islands for every fasta entry in my 5MB input file containing 12 fasta sequences, while my script outputs results only for 6 of them.

I have to have the -plot => 'N', switch otherwise I get errors related to PLPLOT.

With -sequence => $assembly{$frag}, I get some failures reported: Died: ajResourceinNewDrcat 'AGGT ... (abbreviated for clarity) ... GGAG' failed to find drcat or Check 1 Uncaught exception: Formatting Overflow, raised at ajfmt.c:1309 Check 2

With the -sequence => "asis:$assembly{$frag}", I get no errors but results differ from a webserver even when used with the same settings.

The concerns I have are regarding 1) the errors/warnings related to PLPLOT; 2) why is there a difference between -sequence => "asis:$assembly{$frag}" and just -sequence => "$assembly{$frag}" where the variable holds just a DNA sequence 3) the output being different between a webserver and local script using the same settings using webserver1 http://www.bioinformatics.nl/cgi-bin/emboss/cpgplot or webserver2 http://www.ebi.ac.uk/Tools/seqstats/emboss_cpgplot/ (EBI's cpgplot doesn't accept files larger than 1MB). (I'm not able to check what version of EMBOSS is being used by any of these two)

Could there be something wrong with the installation maybe?

Thanks, Marek

Additional session info: print $factory->version(); # prints 6.5.7.0 print $factory->location(); # prints local

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bioperl/bioperl-run/issues/47#issuecomment-282742864, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQ99Bqc9Uh6NYpUW6iHX5mML-xp38yrks5rguQDgaJpZM4MLkmS.

bosborne avatar Feb 27 '17 15:02 bosborne

They are not very long. See the lengths of the sequences below: 1223259 unitig_9 1087134 unitig_0 934178 unitig_1 660843 unitig_2 453028 unitig_5 312397 unitig_6 114250 unitig_11 80635 unitig_3 66509 unitig_7 40100 unitig_4 6932 unitig_8 2384 unitig_10

Interestingly (or not), I only get reported results from 'unitig_11' down. Is there a sequence length limits imposed on cpgplot and other EMBOSS scripts?

piatekmj avatar Feb 27 '17 16:02 piatekmj

Brian, it seems I was able to overcome the problem by using your suggestion of using sequence objects.

while (my $seq = $inseq->next_seq) {
    $assembly{$seq->id} = $seq;
}

Then later on using -sequence => $assembly{$frag} produces the exact output as from example webserver, without any error if used in conjunction with -plot => 'N'.

It seems the major issue is now solved, leaving me wonder why was the message so cryptic? If you would be able to provide any assistance with regards to the PLPLOT issue, it would be great.

piatekmj avatar Feb 27 '17 17:02 piatekmj

Marek,

Excellent. Regarding plplot, are you sure that the executable is in your PATH? I think it’s called pltek.

Brian O.

On Feb 27, 2017, at 12:05 PM, piatekmj [email protected] wrote:

Brian, it seems I was able to overcome the problem by using your suggestion of using sequence objects.

while (my $seq = $inseq->next_seq) { $assembly{$seq->id} = $seq; } Then later on using -sequence => $assembly{$frag} produces the exact output as from example webserver, without any error if used in conjunction with -plot => 'N'.

It seems the major issue is now solved, leaving me wonder why was the message so cryptic? If you would be able to provide any assistance with regards to the PLPLOT issue, it would be great.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bioperl/bioperl-run/issues/47#issuecomment-282782879, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQ99IGcHzVTQwcEqrxfK3VJDlYER4wMks5rgwJtgaJpZM4MLkmS.

bosborne avatar Feb 27 '17 20:02 bosborne

pltek is in my PATH indeed. If you instructed me, I could provide some more details regarding that.

piatekmj avatar Feb 27 '17 20:02 piatekmj

Marek,

I don’t know much about how plotting works in EMBOSS, but this seems to work for me:

my %input = (
-sequence => $assembly{$frag},
-outfile => "${frag}.cpgplot",
-window => 100,
-minlen => 200,
-minoe => 0.6,
-minpc => 50,
-plot => 'Y',
-graph => 'pdf'
);

Brian O.

On Feb 27, 2017, at 3:10 PM, piatekmj [email protected] wrote:

pltek is in my PATH indeed. If you instructed me, I could provide some more details regarding that.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bioperl/bioperl-run/issues/47#issuecomment-282837915, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQ99Md6PnA3uo-3pTw65sKbJsDcM7cnks5rgy3DgaJpZM4MLkmS.

bosborne avatar Feb 27 '17 21:02 bosborne