bioperl-run
bioperl-run copied to clipboard
'cpgplot' input sequence not recognized
Session info: perl = v5.22.0 Mac OS = 10.11.6 BioPerl-Run-1.007001
From Bio::Factory::EMBOSS when trying to provide input sequence for cpgplot
as a variable, I do get an error message saying:
Died: ajResourceinNewDrcat 'AGGT ... (abbreviated for clarity) ... GGAG' failed to find drcat
This seems to be composition dependent as I do not get that for every sequence provided. It seems this error only shows up for sequences that do have CpG island identified.
Also occasionally I do get error message saying:
Uncaught exception: Formatting Overflow, raised at ajfmt.c:1309
The above only applies for some cases not for all, I only get those messages for some sequences.
When I, however provide the input sequence with asis
prefix like so: sequence => "asis:$sequence"
program completes with output.
As long as we are on cpgplot, I would like to also mention that with plot => 'Y'
provided, I do get another error message for sequences for which CpG islands have been identified:
*** PLPLOT ERROR ***
Unable to either (1) open/find or (2) allocate memory for the font file
Program aborted
This can be turned off by providing plot => 'N'
if one is not interested in graphical output.
I do have plplot installed with my package manager.
Any assistance would be appreciated.
Marek,
Please show us some code, then we can take a look.
But I am noting that you said that when you use “asis:” your script works, presumably you are providing a string when you use “asis", like:
-sequencea => "asis:$sequence_str"
If you do not provide a string then your script should supply a Sequence object, like:
-sequencea => $seq_object
This “asis” business is built in to EMBOSS, so you can run a command this way from the command line:
water -asequence=asis:gatc -bsequence=asis:gatc
Otherwise, the EMBOSS applications expect file names, not strings, at the command line.
Brian O.
On Feb 24, 2017, at 2:15 PM, piatekmj [email protected] wrote:
Session info: perl = v5.22.0 Mac OS = 10.11.6 BioPerl-Run-1.007001
From Bio::Factory::EMBOSS when trying to provide input sequence for cpgplot as a variable, I do get an error message saying: Died: ajResourceinNewDrcat 'AGGT ... (abbreviated for clarity) ... GGAG' failed to find drcat This seems to be composition dependent as I do not get that for every sequence provided. It seems this error only shows up for sequences that do have CpG island identified.
Also occasionally I do get error message saying: Uncaught exception: Formatting Overflow, raised at ajfmt.c:1309
The above only applies for some cases not for all, I only get those messages for some sequences.
When I, however provide the input sequence with asis prefix like so: sequence => "asis:$sequence" program completes with output.
As long as we are on cpgplot, I would like to also mention that with plot => 'Y' provided, I do get another error message for sequences for which CpG islands have been identified:
*** PLPLOT ERROR *** Unable to either (1) open/find or (2) allocate memory for the font file Program aborted This can be turned off by providing plot => 'N' if one is not interested in graphical output. I do have plplot installed with my package manager.
Any assistance would be appreciated.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bioperl/bioperl-run/issues/47, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQ99FI0NeyTblkRXdAYEvrMafQg9f3sks5rfyw2gaJpZM4MLkmS.
Hi Brian,
The code I'm testing is as follows:
use Bio::Factory::EMBOSS;
use Bio::SeqIO;
my $genome = 'genome.fa';
my $inseq = Bio::SeqIO->new(
-file => $genome,
-format => 'fasta',
);
my %assembly;
while (my $seq = $inseq->next_seq) {
$assembly{$seq->id} = $seq->seq;
}
my $factory = new Bio::Factory::EMBOSS;
my $prog = $factory -> program('cpgplot');
foreach my $frag (keys %assembly){
print "Processing $frag\n";
my %input = (
-sequence => "asis:$assembly{$frag}",
-outfile => "${frag}.cpgplot",
-window => 100,
-minlen => 200,
-minoe => 0.6,
-minpc => 50,
-plot => 'N',
);
print "$frag\tCheck 1\t";
$prog->run(\%input);
print "Check 2\n";
}
The above produces output that is different than what I obtain from any EMBOSS webserver using the same settings. The output file from webserver contains CpG islands for every fasta entry in my 5MB input file containing 12 fasta sequences, while my script outputs results only for 6 of them.
I have to have the -plot => 'N',
switch otherwise I get errors related to PLPLOT.
With -sequence => $assembly{$frag},
I get some failures reported:
Died: ajResourceinNewDrcat 'AGGT ... (abbreviated for clarity) ... GGAG' failed to find drcat
or
Check 1 Uncaught exception: Formatting Overflow, raised at ajfmt.c:1309 Check 2
With the -sequence => "asis:$assembly{$frag}",
I get no errors but results differ from a webserver even when used with the same settings.
The concerns I have are regarding 1) the errors/warnings related to PLPLOT; 2) why is there a difference between -sequence => "asis:$assembly{$frag}"
and just -sequence => "$assembly{$frag}"
where the variable holds just a DNA sequence 3) the output being different between a webserver and local script using the same settings using webserver1 or webserver2 (EBI's cpgplot doesn't accept files larger than 1MB). (I'm not able to check what version of EMBOSS is being used by any of these two)
Could there be something wrong with the installation maybe?
Thanks, Marek
Additional session info: print $factory->version(); # prints 6.5.7.0 print $factory->location(); # prints local
Marek,
I don’t have your input file but I suspect that using a long string with returns in it is not working, as “-sequence". Try using Sequence objects, rather than strings:
use Bio::Factory::EMBOSS; use Bio::SeqIO;
my $genome = 'genome.fa'; my $inseq = Bio::SeqIO->new( -file => $genome, -format => 'fasta', );
my %assembly;
while (my $seq = $inseq->next_seq) { $assembly{$seq->id} = $seq; }
my $factory = new Bio::Factory::EMBOSS; my $prog = $factory -> program('cpgplot');
foreach my $frag (keys %assembly){
print "Processing $frag\n";
my %input = (
-sequence => $assembly{$frag},
-outfile => "${frag}.cpgplot",
-window => 100,
-minlen => 200,
-minoe => 0.6,
-minpc => 50,
-plot => 'N',
);
print "$frag\tCheck 1\t";
$prog->run(\%input);
print "Check 2\n";
}
Brian O.
On Feb 27, 2017, at 9:56 AM, piatekmj [email protected] wrote:
Hi Brian,
The code I'm testing is as follows:
use Bio::Factory::EMBOSS; use Bio::SeqIO;
my $genome = 'genome.fa'; my $inseq = Bio::SeqIO->new(
-file => $genome,
-format => 'fasta', );my %assembly;
while (my $seq = $inseq->next_seq) { $assembly{$seq->id} = $seq->seq; }
my $factory = new Bio::Factory::EMBOSS; my $prog = $factory -> program('cpgplot');
foreach my $frag (keys %assembly){
print "Processing $frag\n"; my %input = (
-sequence => "asis:$assembly{$frag}", -outfile => "${frag}.cpgplot", -window => 100, -minlen => 200, -minoe => 0.6, -minpc => 50, -plot => 'N', );
print "$frag\tCheck 1\t"; $prog->run(\%input); print "Check 2\n";
}
The above produces output that is different than what I obtain from any EMBOSS webserver using the same settings. The output file from webserver http://www.bioinformatics.nl/cgi-bin/emboss/cpgplot contains CpG islands for every fasta entry in my 5MB input file containing 12 fasta sequences, while my script outputs results only for 6 of them.
I have to have the -plot => 'N', switch otherwise I get errors related to PLPLOT.
With -sequence => $assembly{$frag}, I get some failures reported: Died: ajResourceinNewDrcat 'AGGT ... (abbreviated for clarity) ... GGAG' failed to find drcat or Check 1 Uncaught exception: Formatting Overflow, raised at ajfmt.c:1309 Check 2
With the -sequence => "asis:$assembly{$frag}", I get no errors but results differ from a webserver even when used with the same settings.
The concerns I have are regarding 1) the errors/warnings related to PLPLOT; 2) why is there a difference between -sequence => "asis:$assembly{$frag}" and just -sequence => "$assembly{$frag}" where the variable holds just a DNA sequence 3) the output being different between a webserver and local script using the same settings using webserver1 http://www.bioinformatics.nl/cgi-bin/emboss/cpgplot or webserver2 http://www.ebi.ac.uk/Tools/seqstats/emboss_cpgplot/ (EBI's cpgplot doesn't accept files larger than 1MB). (I'm not able to check what version of EMBOSS is being used by any of these two)
Could there be something wrong with the installation maybe?
Thanks, Marek
Additional session info: print $factory->version(); # prints 6.5.7.0 print $factory->location(); # prints local
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bioperl/bioperl-run/issues/47#issuecomment-282742864, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQ99Bqc9Uh6NYpUW6iHX5mML-xp38yrks5rguQDgaJpZM4MLkmS.
Marek,
OK, I see you tried using Sequence objects. How long are the sequences in “genome.fa”?
Brian O.
On Feb 27, 2017, at 9:56 AM, piatekmj [email protected] wrote:
Hi Brian,
The code I'm testing is as follows:
use Bio::Factory::EMBOSS; use Bio::SeqIO;
my $genome = 'genome.fa'; my $inseq = Bio::SeqIO->new(
-file => $genome,
-format => 'fasta', );my %assembly;
while (my $seq = $inseq->next_seq) { $assembly{$seq->id} = $seq->seq; }
my $factory = new Bio::Factory::EMBOSS; my $prog = $factory -> program('cpgplot');
foreach my $frag (keys %assembly){
print "Processing $frag\n"; my %input = (
-sequence => "asis:$assembly{$frag}", -outfile => "${frag}.cpgplot", -window => 100, -minlen => 200, -minoe => 0.6, -minpc => 50, -plot => 'N', );
print "$frag\tCheck 1\t"; $prog->run(\%input); print "Check 2\n";
}
The above produces output that is different than what I obtain from any EMBOSS webserver using the same settings. The output file from webserver http://www.bioinformatics.nl/cgi-bin/emboss/cpgplot contains CpG islands for every fasta entry in my 5MB input file containing 12 fasta sequences, while my script outputs results only for 6 of them.
I have to have the -plot => 'N', switch otherwise I get errors related to PLPLOT.
With -sequence => $assembly{$frag}, I get some failures reported: Died: ajResourceinNewDrcat 'AGGT ... (abbreviated for clarity) ... GGAG' failed to find drcat or Check 1 Uncaught exception: Formatting Overflow, raised at ajfmt.c:1309 Check 2
With the -sequence => "asis:$assembly{$frag}", I get no errors but results differ from a webserver even when used with the same settings.
The concerns I have are regarding 1) the errors/warnings related to PLPLOT; 2) why is there a difference between -sequence => "asis:$assembly{$frag}" and just -sequence => "$assembly{$frag}" where the variable holds just a DNA sequence 3) the output being different between a webserver and local script using the same settings using webserver1 http://www.bioinformatics.nl/cgi-bin/emboss/cpgplot or webserver2 http://www.ebi.ac.uk/Tools/seqstats/emboss_cpgplot/ (EBI's cpgplot doesn't accept files larger than 1MB). (I'm not able to check what version of EMBOSS is being used by any of these two)
Could there be something wrong with the installation maybe?
Thanks, Marek
Additional session info: print $factory->version(); # prints 6.5.7.0 print $factory->location(); # prints local
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bioperl/bioperl-run/issues/47#issuecomment-282742864, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQ99Bqc9Uh6NYpUW6iHX5mML-xp38yrks5rguQDgaJpZM4MLkmS.
They are not very long. See the lengths of the sequences below: 1223259 unitig_9 1087134 unitig_0 934178 unitig_1 660843 unitig_2 453028 unitig_5 312397 unitig_6 114250 unitig_11 80635 unitig_3 66509 unitig_7 40100 unitig_4 6932 unitig_8 2384 unitig_10
Interestingly (or not), I only get reported results from 'unitig_11' down. Is there a sequence length limits imposed on cpgplot and other EMBOSS scripts?
Brian, it seems I was able to overcome the problem by using your suggestion of using sequence objects.
while (my $seq = $inseq->next_seq) {
$assembly{$seq->id} = $seq;
}
Then later on using -sequence => $assembly{$frag}
produces the exact output as from example webserver, without any error if used in conjunction with -plot => 'N'
.
It seems the major issue is now solved, leaving me wonder why was the message so cryptic? If you would be able to provide any assistance with regards to the PLPLOT issue, it would be great.
Marek,
Excellent. Regarding plplot, are you sure that the executable is in your PATH? I think it’s called pltek.
Brian O.
On Feb 27, 2017, at 12:05 PM, piatekmj [email protected] wrote:
Brian, it seems I was able to overcome the problem by using your suggestion of using sequence objects.
while (my $seq = $inseq->next_seq) { $assembly{$seq->id} = $seq; } Then later on using -sequence => $assembly{$frag} produces the exact output as from example webserver, without any error if used in conjunction with -plot => 'N'.
It seems the major issue is now solved, leaving me wonder why was the message so cryptic? If you would be able to provide any assistance with regards to the PLPLOT issue, it would be great.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bioperl/bioperl-run/issues/47#issuecomment-282782879, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQ99IGcHzVTQwcEqrxfK3VJDlYER4wMks5rgwJtgaJpZM4MLkmS.
pltek is in my PATH indeed. If you instructed me, I could provide some more details regarding that.
Marek,
I don’t know much about how plotting works in EMBOSS, but this seems to work for me:
my %input = (
-sequence => $assembly{$frag},
-outfile => "${frag}.cpgplot",
-window => 100,
-minlen => 200,
-minoe => 0.6,
-minpc => 50,
-plot => 'Y',
-graph => 'pdf'
);
Brian O.
On Feb 27, 2017, at 3:10 PM, piatekmj [email protected] wrote:
pltek is in my PATH indeed. If you instructed me, I could provide some more details regarding that.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bioperl/bioperl-run/issues/47#issuecomment-282837915, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQ99Md6PnA3uo-3pTw65sKbJsDcM7cnks5rgy3DgaJpZM4MLkmS.