sga icon indicating copy to clipboard operation
sga copied to clipboard

sga-align: prepareReads: Cannot parse record

Open sjackman opened this issue 9 years ago • 5 comments

sga-align -t 64 --name pe400 hsapiens-contigs.fa pe400.fa.gz
…
Completed Task = 'indexContigs' 
Task enters queue = 'prepareReads' 
Cannot parse record >HISEQ1:93:H2YHMBCXX:1:1101:1165:2015 at /gsc/btl/linuxbrew/bin/sga-deinterleave.pl line 63, <IN> line 2.

The file pe400.fa.gz is interleaved paired-end reads. The first 8 lines are:

>HISEQ1:93:H2YHMBCXX:1:1101:1165:2015 ec:Z:0_0:1_0_1:0_0
TTATACAAAGAATTAAGAACAAAAGTGAAATTGAATATTTTTTAATTGCTCTAAAAGTTAATGGACTATTTAAAACAAAAATTATAAAAATATGTTTATACCATTAATAGAAGTAAAATATATAAAACCATGGAATAACACACAGACTAGGAGGACTTGGGAATATGCTGTTACATTGCATATTAAGTGGTATTATATTATTTGAAGTTAGATTTATTAACAATTACAGAGCTAATTTTTTTTTTAAAAA
>HISEQ1:93:H2YHMBCXX:1:1101:1165:2015 ec:Z:0_0:1_0_1:0_0
CTGACATCTTTCTGGCATCCTTAAAAGCCCTGGCTTTTAAGCATAACTTCTTGACCTACTTGTTCCCTTCCTGAGCATGAGAGCAGTGGTGACTCAGGAACAGGAAAGGCAGACCACAGTGGTGACAGTGTTTTCCTCAAAGAGGATTTATACCTGTTTTTTTAAAAAAAAAATTAGCTCTGTAATTGTTAATAAATCTAACTTCAAATAATATAATACCACTTAATATGCAATGTAACAGCATATTCCC
>HISEQ1:93:H2YHMBCXX:1:1101:1157:2041 ec:Z:0_0:3_0_3:0_0
GACCCGGTCCTGCGATTTGTCCCGTTGTAGACCTGGGAACAGGCAGGCGGGAACTGGGGGCTTTACTGGGGGATTTGAGGCTGGGGAGGGGGAGGGAGCAAATGTCATGGCTGGCTCGCTCAAGCATCCAGGGAACCGAAGCTAAGCGCATCCTGACGGGCTTTTAAAATGACATTGATTAGGACAAGCTGTTCCCAACCCCAGTAAGAGTTAATCTGCCTGTTAATCAAGGCACTAAGGGGCTCAATGC
>HISEQ1:93:H2YHMBCXX:1:1101:1157:2041 ec:Z:0_0:29_0_28:2_0
CCCCGGGCAGCGGTTTTCCCCGCTAGCCAGGTTTGGAAGTCACCCTCTGTGAGACTGGGTTAGGAAGTGACGAAAAGCGCCGAATTGTTTTCAAATTGAAAATACTTTTTTTTTTTTTTTTGGAGATAGCGCTGACAAATATATGGGATCCCGGCTTTTGATCCCTGGCTGCCGCCTCTGTTCTCCTGTCGCTAATAAAACTCGCATTGAGCCCCTTAGTGCCTTGATTAACAGGCAGATTAACTCTTAC

sjackman avatar Jul 25 '16 21:07 sjackman

What variant of FASTQ is that? I don't recognise the SAM-like key/value pair.

On Mon, Jul 25, 2016 at 5:57 PM, Shaun Jackman [email protected] wrote:

sga-align -t 64 --name pe400 hsapiens-contigs.fa pe400.fa.gz … Completed Task = 'indexContigs' Task enters queue = 'prepareReads' Cannot parse record >HISEQ1:93:H2YHMBCXX:1:1101:1165:2015 at /gsc/btl/linuxbrew/bin/sga-deinterleave.pl line 63, <IN> line 2.

The file pe400.fa.gz is interleaved paired-end reads. The first 8 lines are:

HISEQ1:93:H2YHMBCXX:1:1101:1165:2015 ec:Z:0_0:1_0_1:0_0 TTATACAAAGAATTAAGAACAAAAGTGAAATTGAATATTTTTTAATTGCTCTAAAAGTTAATGGACTATTTAAAACAAAAATTATAAAAATATGTTTATACCATTAATAGAAGTAAAATATATAAAACCATGGAATAACACACAGACTAGGAGGACTTGGGAATATGCTGTTACATTGCATATTAAGTGGTATTATATTATTTGAAGTTAGATTTATTAACAATTACAGAGCTAATTTTTTTTTTAAAAA HISEQ1:93:H2YHMBCXX:1:1101:1165:2015 ec:Z:0_0:1_0_1:0_0 CTGACATCTTTCTGGCATCCTTAAAAGCCCTGGCTTTTAAGCATAACTTCTTGACCTACTTGTTCCCTTCCTGAGCATGAGAGCAGTGGTGACTCAGGAACAGGAAAGGCAGACCACAGTGGTGACAGTGTTTTCCTCAAAGAGGATTTATACCTGTTTTTTTAAAAAAAAAATTAGCTCTGTAATTGTTAATAAATCTAACTTCAAATAATATAATACCACTTAATATGCAATGTAACAGCATATTCCC HISEQ1:93:H2YHMBCXX:1:1101:1157:2041 ec:Z:0_0:3_0_3:0_0 GACCCGGTCCTGCGATTTGTCCCGTTGTAGACCTGGGAACAGGCAGGCGGGAACTGGGGGCTTTACTGGGGGATTTGAGGCTGGGGAGGGGGAGGGAGCAAATGTCATGGCTGGCTCGCTCAAGCATCCAGGGAACCGAAGCTAAGCGCATCCTGACGGGCTTTTAAAATGACATTGATTAGGACAAGCTGTTCCCAACCCCAGTAAGAGTTAATCTGCCTGTTAATCAAGGCACTAAGGGGCTCAATGC HISEQ1:93:H2YHMBCXX:1:1101:1157:2041 ec:Z:0_0:29_0_28:2_0 CCCCGGGCAGCGGTTTTCCCCGCTAGCCAGGTTTGGAAGTCACCCTCTGTGAGACTGGGTTAGGAAGTGACGAAAAGCGCCGAATTGTTTTCAAATTGAAAATACTTTTTTTTTTTTTTTTGGAGATAGCGCTGACAAATATATGGGATCCCGGCTTTTGATCCCTGGCTGCCGCCTCTGTTCTCCTGTCGCTAATAAAACTCGCATTGAGCCCCTTAGTGCCTTGATTAACAGGCAGATTAACTCTTAC

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jts/sga/issues/121, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXxn2Tz8F5jw5EYs3NFVv5que0qPcB9ks5qZTFKgaJpZM4JUlv6 .

jts avatar Jul 25 '16 22:07 jts

It's produced by BFC.

sjackman avatar Jul 25 '16 22:07 sjackman

Is it safe to assume that the first record is always the first end of the pair? Alternatively you could use the uncorrected reads in scaffolding (which I typically recommend anyway)

On Mon, Jul 25, 2016 at 6:45 PM, Shaun Jackman [email protected] wrote:

It's produced by BFC https://github.com/lh3/bfc.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jts/sga/issues/121#issuecomment-235109353, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXxn1dTwKCRs2M9GuuBbdvDEtWXBsL_ks5qZTybgaJpZM4JUlv6 .

jts avatar Jul 25 '16 22:07 jts

Yes, the first record is always the first read of the pair / mate-pair. FR orientation for PE and RF orientation for MP. Good suggestion. If there's no easy workaround for using the corrected reads, I'll use the uncorrected reads.

sjackman avatar Jul 25 '16 23:07 sjackman

I instead aligned the reads using bwa mem

bwa mem -t32 -p contigs.fa reads.fa.gz | samtools view -F2304 -b -o reads.bam -

sjackman avatar Jul 28 '16 22:07 sjackman