masurca
masurca copied to clipboard
Error correction of PE reads failed. Check pe.cor.log.
Hi!
I'm trying to run masurca with Illumina pair end libraries and pacbio long reads.
Here you have my confog file:
DATA
#Illumina paired end reads supplied as
PARAMETERS #set this to 1 if your Illumina jumping library reads are shorter than 100bp EXTEND_JUMP_READS=0 #this is k-mer size for deBruijn graph values between 25 and 127 are supported, auto will compute the optimal size based on the read data and GC content GRAPH_KMER_SIZE = auto #set this to 1 for all Illumina-only assemblies #set this to 0 if you have more than 15x coverage by long reads (Pacbio or Nanopore) or any other long reads/mate pairs (Illumina MP, Sanger, 454, etc) USE_LINKING_MATES = 0 #specifies whether to run mega-reads correction on the grid USE_GRID=0 #specifies queue to use when running on the grid MANDATORY GRID_QUEUE=all.q #batch size in the amount of long read sequence for each batch on the grid GRID_BATCH_SIZE=300000000 #use at most this much coverage by the longest Pacbio or Nanopore reads, discard the rest of the reads LHE_COVERAGE=25 #set to 1 to only do one pass of mega-reads, for faster but worse quality assembly MEGA_READS_ONE_PASS=0 #this parameter is useful if you have too many Illumina jumping library mates. Typically set it to 60 for bacteria and 300 for the other organisms LIMIT_JUMP_COVERAGE = 300 #these are the additional parameters to Celera Assembler. do not worry about performance, number or processors or batch sizes -- these are computed automatically. #set cgwErrorRate=0.25 for bacteria and 0.1<=cgwErrorRate<=0.15 for other organisms. CA_PARAMETERS = cgwErrorRate=0.15 #minimum count k-mers used in error correction 1 means all k-mers are used. one can increase to 2 if Illumina coverage >100 KMER_COUNT_THRESHOLD = 1 #whether to attempt to close gaps in scaffolds with Illumina data CLOSE_GAPS=1 #auto-detected number of cpus to use NUM_THREADS = 32 #this is mandatory jellyfish hash size -- a safe value is estimated_genome_size*estimated_coverage JF_SIZE = 3000000000 #set this to 1 to use SOAPdenovo contigging/scaffolding module. Assembly will be worse but will run faster. Useful for very large (>5Gbp) genomes from Illumina-only data SOAP_ASSEMBLY=0 END
And here the output I get when running assemble.sh:
[mar oct 2 11:53:45 CEST 2018] Processing pe library reads awk: line ord.:1: fatal: division by zero attempted [mar oct 2 11:53:45 CEST 2018] Average PE read length Illegal division by zero at -e line 1. [mar oct 2 11:53:45 CEST 2018] Using kmer size of for the graph [mar oct 2 11:53:45 CEST 2018] MIN_Q_CHAR: 64 [mar oct 2 11:53:45 CEST 2018] Error correct PE [mar oct 2 11:54:01 CEST 2018] Error correction of PE reads failed. Check pe.cor.log.
This is how my read files look like:
[root@clemen5 PacBio]# head ivia000_1.fastq @HWI-ST459_0069:1:1:1263:1962#0/1 GGGGGGGGAGGGGAGGAGGGGAGGGGGGGGGGGTGGGGGTGAGTGGAGGANAGGAGGGGNGNGAATGAGGAGGTAAGGGGGGAGGTTGGGTGAGGGAAGC +HWI-ST459_0069:1:1:1263:1962#0/1 _WQX_BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB @HWI-ST459_0069:1:1:1354:1977#0/1 GGAGGGGGGGGGGGGGGGGGCCGGGGGGGGGGCGGGGGGGGGGGCGAGGGNGGGGGGGGGGGGGGAGAGGTGGAGGGGGGGGGCAGGGGGTGAGGGGAGG +HWI-ST459_0069:1:1:1354:1977#0/1 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB [root@clemen5 PacBio]# head PACBIO_clem.fa @m54221_171212_235526/4260368/0_10489 GTGAATGGAAAAAGGAGAATTTTCTTTCAGATATCGTACCATTCATTGAGATTTGATCTCGTCCTAACTGATAGCGATGGCCTCCCATTTTCATCCCGTTG CTGAATAAGGACAGCTAACAAGTCCTCATCATGACATGAGCATCGTCTTGTTCTTCCTTTGTCTCCGTTGTTGTCAAACTCTCTCATCTATAATCGCATCA TGATACTTGAGCAGTTCTCATAAGCGTCACTATAAATTTTTTTCAATGCCTTCCAAATCGAACACTCGCATCCAGGGAACATAATCGGATAGGCGAAC...
¿Any suggestion?
Thank you very much in advance for your help
you should try fasta format using pacbio reads. best
On Thu, Oct 4, 2018 at 3:21 AM jterol [email protected] wrote:
Hi!
I'm trying to run masurca with Illumina pair end libraries and pacbio long reads.
Here you have my confog file:
DATA #Illumina paired end reads supplied as <forward_reads> <reverse_reads> #if single-end, do not specify <reverse_reads> #MUST HAVE Illumina paired end reads to use MaSuRCA PE= pe 515 13 /home/jterol/PacBio/ivia000_1.fastq /home/jterol/PacBio/ivia000_2.fastq #Illumina mate pair reads supplied as <forward_reads> <reverse_reads> #pacbio OR nanopore reads must be in a single fasta or fastq file with absolute path, can be gzipped #if you have both types of reads supply them both as NANOPORE type PACBIO=/home/jterol/PacBio/PACBIO_clem.fa #NANOPORE=/FULL_PATH/nanopore.fa #Other reads (Sanger, 454, etc) one frg file, concatenate your frg files into one if you have many #OTHER=/FULL_PATH/file.frg END
PARAMETERS #set this to 1 if your Illumina jumping library reads are shorter than 100bp EXTEND_JUMP_READS=0 #this is k-mer size for deBruijn graph values between 25 and 127 are supported, auto will compute the optimal size based on the read data and GC content GRAPH_KMER_SIZE = auto #set this to 1 for all Illumina-only assemblies #set this to 0 if you have more than 15x coverage by long reads (Pacbio or Nanopore) or any other long reads/mate pairs (Illumina MP, Sanger, 454, etc) USE_LINKING_MATES = 0 #specifies whether to run mega-reads correction on the grid USE_GRID=0 #specifies queue to use when running on the grid MANDATORY GRID_QUEUE=all.q #batch size in the amount of long read sequence for each batch on the grid GRID_BATCH_SIZE=300000000 #use at most this much coverage by the longest Pacbio or Nanopore reads, discard the rest of the reads LHE_COVERAGE=25 #set to 1 to only do one pass of mega-reads, for faster but worse quality assembly MEGA_READS_ONE_PASS=0 #this parameter is useful if you have too many Illumina jumping library mates. Typically set it to 60 for bacteria and 300 for the other organisms LIMIT_JUMP_COVERAGE = 300 #these are the additional parameters to Celera Assembler. do not worry about performance, number or processors or batch sizes -- these are computed automatically. #set cgwErrorRate=0.25 for bacteria and 0.1<=cgwErrorRate<=0.15 for other organisms. CA_PARAMETERS = cgwErrorRate=0.15 #minimum count k-mers used in error correction 1 means all k-mers are used. one can increase to 2 if Illumina coverage >100 KMER_COUNT_THRESHOLD = 1 #whether to attempt to close gaps in scaffolds with Illumina data CLOSE_GAPS=1 #auto-detected number of cpus to use NUM_THREADS = 32 #this is mandatory jellyfish hash size -- a safe value is estimated_genome_size*estimated_coverage JF_SIZE = 3000000000 #set this to 1 to use SOAPdenovo contigging/scaffolding module. Assembly will be worse but will run faster. Useful for very large (>5Gbp) genomes from Illumina-only data SOAP_ASSEMBLY=0 END
And here the output I get when running assemble.sh:
[mar oct 2 11:53:45 CEST 2018] Processing pe library reads awk: line ord.:1: fatal: division by zero attempted [mar oct 2 11:53:45 CEST 2018] Average PE read length Illegal division by zero at -e line 1. [mar oct 2 11:53:45 CEST 2018] Using kmer size of for the graph [mar oct 2 11:53:45 CEST 2018] MIN_Q_CHAR: 64 [mar oct 2 11:53:45 CEST 2018] Error correct PE [mar oct 2 11:54:01 CEST 2018] Error correction of PE reads failed. Check pe.cor.log.
This is how my read files look like:
[root@clemen5 PacBio]# head ivia000_1.fastq @HWI-ST459_0069:1:1:1263:1962#0/1
GGGGGGGGAGGGGAGGAGGGGAGGGGGGGGGGGTGGGGGTGAGTGGAGGANAGGAGGGGNGNGAATGAGGAGGTAAGGGGGGAGGTTGGGTGAGGGAAGC +HWI-ST459_0069:1:1:1263:1962#0/1
_WQX_BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB @HWI-ST459_0069:1:1:1354:1977#0/1
GGAGGGGGGGGGGGGGGGGGCCGGGGGGGGGGCGGGGGGGGGGGCGAGGGNGGGGGGGGGGGGGGAGAGGTGGAGGGGGGGGGCAGGGGGTGAGGGGAGG +HWI-ST459_0069:1:1:1354:1977#0/1
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB [root@clemen5 PacBio]# head PACBIO_clem.fa @m54221_171212_235526/4260368/0_10489
GTGAATGGAAAAAGGAGAATTTTCTTTCAGATATCGTACCATTCATTGAGATTTGATCTCGTCCTAACTGATAGCGATGGCCTCCCATTTTCATCCCGTTG
CTGAATAAGGACAGCTAACAAGTCCTCATCATGACATGAGCATCGTCTTGTTCTTCCTTTGTCTCCGTTGTTGTCAAACTCTCTCATCTATAATCGCATCA
TGATACTTGAGCAGTTCTCATAAGCGTCACTATAAATTTTTTTCAATGCCTTCCAAATCGAACACTCGCATCCAGGGAACATAATCGGATAGGCGAAC...
¿Any suggestion?
Thank you very much in advance for your help
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/70, or mute the thread https://github.com/notifications/unsubscribe-auth/AXaRKEFPVdVjcxZHM9Rk3WXHEom43OOWks5uhdMXgaJpZM4XHsfR .
-- Fuyou Fu, Ph.D. Department of Botany and Plant Pathology Purdue University USA
dear jterol: have you solved your problem? I am trying to run masurca with only Illumina pair end reads and met the same problem.if this problem is associated with running out of memory?
dear jterol: have you solved your problem? I am trying to run masurca with only Illumina pair end reads and met the same problem.if this problem is associated with running out of memory?