bwa
bwa copied to clipboard
Paired reads have different names
Hi,
I encountered the issue about "paired reads have different names" in some of my sequencing data. The data are PE reads generated from MiSeq.
The bwa commands I used were as follows:
$ bwa index ref.fasta
$ bwa mem ref.fasta read1.fastq read2.fastq -v 2 > map.sam
It terminated prematurely with the error message:
[M::mem_pestat] several lines of information...
[mem_sam_pe] paired reads have different names: "MISEQ-Sample1:1:2106:12181:2146", "MISEQ-Sample1:1:2106:12181:21461"
bwa mem mapping failed.
Problem mapping with bwa mem.
Problem mapping to the reference in ref.fasta. Quitting.
I used grep
to show a few lines of the input fastq files:
$ grep -n -A 3 MISEQ-Sample1:1:2106:12181:2146 read1.fastq
5578609:@MISEQ-Sample1:1:2106:12181:2146/1
5578610-ATGCTGCAATTATAAGAGAGGTTGAGATTATCATTGCCAAAACTGATAGTGCTATTTGTGCTATAGATTTTAAATTTAATTTGTATAAACAAGAGGATATTACAATGAGATGATTAAGAGTATCCCAGGTCTTTTCTAGAGTCCCGGCAGTGCGTTGATTCTTGTTTTTGGACATTGTTGCATTTGCCCCCCCCAGATCGGAGAGCACACGTCTGAACTCCAGTCACTCGCCACAATCTCGTATGCCGTCTTCTGCTTGAAAAAA
5578611-+
5578612-CCCCCGGGGGGGGGGGGFGEGGGGGGGGGGGGGGGGGGGGGGGFGGFGGFFFGGGFFGFEGGGGGGGFGFGGEGGGGGGGGGGFGGCGGGGGGGGGGGGGGGGGGGGGFGGGGGFFGGGGGGCGFGFGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGG>8CFGGDGCDFE:E=FGGGG1>FCEGGGG
--
5771065:@MISEQ-Sample1:1:2106:12181:21461/1
5771066-GCTGGCTTGTTGTTCTGTGTTGGAGTAGAGGTTGTGCTTTTGGTTTGTGCTGTTGTATGGTGTGTTTCTGATTTTGTATTGGGTGATATTGTGGCTGAGTTTGTGTGGATTGGTGGTGTGGCTGTGGGTTGTTCGGATGGGCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCGCCACAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAACAAACACATAATATAAATACCACTGTGTCATCTGTTAGATGCAA
5771067-+
5771068-CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG@FGGGGGGGGGGEGGGGGGGGGGEGCGGGGGGG8:12***2**+2+2+++2++++22+25++2++++++3+0+30++3+
$ grep -n -A 3 MISEQ-Sample1:1:2106:12181:2146 read2.fastq
5578609:@MISEQ-Sample1:1:2106:12181:2146/2
5578610-GGGGGGGGCAAATGCAACAATGTCCAAAAACAAGAATCAACGCACTGCCGGGACTCTAGAAAAGACCTGGGATACTCTTAATCATCTCATTGTAATATCCTCTTGTTTATACAAATTAAATTTAAAATCTATAGCACAAATAGCACTATCAGTTTTGGCAATGATAATCTCAACCTCTCTTATAATTGCAGCATAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTCTAGGACGGTGTAGAGCCCGGTGGTCGCCGTGGCAGTA
5578611-+
5578612-CCCCCGGGGGGGGGGGGGGGGGGFGGFFGGGFGGGGGGGGFGGGGGGFGCEGGGFFFGGGFGCFFGGGGDFGGGFGGDGGGDFGFG?FFFFGGGGGFCGGGGGGGGCFGGGGGGFGGGGGGGGGGGFGGGGGGFFGGGGFEFFFGFGGGGGGGGGGFEGGGCGFFGGGFFFGGGGGGGGGGGGGGGGGGGGGGGFFCGGGGGGGGGGGGGC?*2CGGGGGFGC*02<CDECC097E3**2<**2**/:8DC**85)/./)0.*8*
--
5771065:@MISEQ-Sample1:1:2106:12181:21461/2
5771066-GGCCCATCCGAACAACCCACAGCCACACCACCAATCCACACAAACTCAGCCACAATATCACCCAATACAAAATCAGAAACACACCATACAACAGCACAAACCAAAAGCACAACCTCTACTCCAACACAGAACAACAAGCCAGCAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTCTAGGACGGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAAAAAACCACAACACCACAACAAACCAAGCACCGGACTAACACC
5771067-+
5771068-CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCFGGGGGGGGFGGGGFGGGGGGGGGGGGGGGGGFGGCGGGGGGFGGGGGEF@EFGGGFFGGGGCGCCFGFGG>@CC:EECEE?8CCFGGGCFGGGEECE*2*:****1*/9C***1*****1*/.9***)*)1.)2*)*6<
It seems to me that bwa is unable to differentiate MISEQ-Sample1:1:2106:12181:2146 and MISEQ-Sample1:1:2106:12181:21461, where only the last digit was different.
I tried to modified the read names from MISEQ-Sample1:1:2106:12181:21461 to MISEQ-Sample1:1:2106:12181:21463, and it terminated again with the same error but different reads:
[mem_sam_pe] paired reads have different names: "MISEQ-Sample1:1:1108:19211:1173", "MISEQ-Sample1:1:1108:19211:11731"
bwa mem mapping failed.
Problem mapping with bwa mem.
Problem mapping to the reference in ref.fasta. Quitting.
I thought this might be an issue. Could you please help look into it?
Many thanks, Michael
Hi @MichaelVirology
Have you fixed your problem?
I think I meet the same problem as you.
nohup ~/software/bwa-0.7.17/bwa mem ~/reference/hg38/hg38bwaidx E1_input.fq.gz E1_pulldown.fq.gz 1>E1.sam 2>E1.bwa.align.log &
and it shows
nohup: ignoring input [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::process] read 200000 sequences (10000000 bp)... [M::process] read 200000 sequences (10000000 bp)... [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (1, 0, 0, 0) [M::mem_pestat] skip orientation FF as there are not enough pairs [M::mem_pestat] skip orientation FR as there are not enough pairs [M::mem_pestat] skip orientation RF as there are not enough pairs [M::mem_pestat] skip orientation RR as there are not enough pairs [mem_sam_pe] paired reads have different names: "CL100056099L2C001R002_24", "CL100056099L2C001R002_61"
also I checked the error reads $ paste <(gunzip -c E1_input.fq.gz | paste - - - - | cut -f 1) paste <(gunzip -c E1_pulldown.fq.gz | paste - - - - | cut -f 1) | grep -m 1 -F "CL100056099L2C001R002_24" paste: paste: No such file or directory $ paste <(gunzip -c E1_input.fq.gz | paste - - - - | cut -f 1) | grep -m 1 -F "CL100056099L2C001R002_24" @CL100056099L2C001R002_24 $ paste <(gunzip -c E1_pulldown.fq.gz | paste - - - - | cut -f 1) | grep -m 1 -F "CL100056099L2C001R002_24" @CL100056099L2C001R002_2406
$ paste <(gunzip -c E1_input.fq.gz | paste - - - - | cut -f 1) paste <(gunzip -c E1_pulldown.fq.gz | paste - - - - | cut -f 1) | grep -m 1 -F "CL100056099L2C001R002_61" paste: paste: No such file or directory $ paste <(gunzip -c E1_input.fq.gz | paste - - - - | cut -f 1) | grep -m 1 -F "CL100056099L2C001R002_61" @CL100056099L2C001R002_615 $ paste <(gunzip -c E1_pulldown.fq.gz | paste - - - - | cut -f 1) | grep -m 1 -F "CL100056099L2C001R002_61" @CL100056099L2C001R002_61
It seems the reads don't have same read name in the r1 and r2 fastqs. and here https://www.biostars.org/p/239535/, they say the data wouldn't be trust.Do there any other fixed methods?Thanks so much for any help.
If bwa is terminated with this error, it usually signifies that pair reads in both FASTQ files are disordered.
From BBTools Repair Guide: With paired reads in 2 files, the first read in file 1 must be the mate of the first read in file 2, etc. For paired reads in a single interleaved file, the second read is the mate of the first read, and the 4th read is the mate of the 3rd read, etc.
There are few solutions which worked for me.
- Sort FASTQ files
- BBTools repair.sh is supposed to do that, but it seems it cannot fix large datasets as it loads whole file to RAM.
- FASTQ-SORT is way better for large FASTQs. It buffers RAM on hard drive, it successfully sorted ~ 150GB file.
- Use another aligner
- Bowtie 2 doesn't require the condition of mates in the same order, but while aligning disordered FASTQs with Bowtie 2 (as of my experience) you get less mapped reads.
@tomaskopsa Hi Tomas, thank you so much for sharing. As for me,I have checked the reads:
$ gunzip -c E1_input.fq.gz | paste - - - - | cut -f 1| head
@CL100056099L2C001R002_24
@CL100056099L2C001R002_40
@CL100056099L2C001R002_45
@CL100056099L2C001R002_73
@CL100056099L2C001R002_74
@CL100056099L2C001R002_81
@CL100056099L2C001R002_90
@CL100056099L2C001R002_91
@CL100056099L2C001R002_95
@CL100056099L2C001R002_103
$ gunzip -c E1_pulldown.fq.gz | paste - - - - | cut -f 1| head
@CL100056099L2C001R002_61
@CL100056099L2C001R002_115
@CL100056099L2C001R002_154
@CL100056099L2C001R002_218
@CL100056099L2C001R002_228
@CL100056099L2C001R002_253
@CL100056099L2C001R002_255
@CL100056099L2C001R002_269
@CL100056099L2C001R044_184179
@CL100056099L2C001R002_305
It seems the reads name are totally different so maybe I can't use BBTools repair.sh to sort order.Also I have read http://seqanswers.com/forums/showthread.php?t=46538 https://www.biostars.org/p/254155/ https://www.biostars.org/p/160701/ https://www.biostars.org/p/176230/ so I guess maybe the disordered reads is caused by dump or trim because it is my PI who gave me these trimmed data.According to the last reference I add -p in my bwa-men alignment:
$nohup ~/software/bwa-0.7.17/bwa mem -M -R "@RG\tID:E1\tSM:E1\tLB:ATACseq\tPL:Illumina" ~/reference/hg38/hg38bwaidx -p E1_input.fq.gz E1_pulldown.fq.gz 1>E1.sam 2>E1.bwa.align.log &
Though the bwa-men can run successfully I am not sure how about the alignment quality, so I also decide to test by bowtie2.Thank you again for sharing your bowtie2 experience.
Hi @qiaowei-vvjoe are you sure your FASTQ files contains paired reads ?
Hi @tomaskopsa Frankly speaking I am not sure.I just got these fastq from my PI, then I fastqc them showing no adapter.I am not sure how they library and trim, maybe so do my PI. I only have this docx analysis.docxwhich show their information.Could you please give me some advice about my following analysis?Thank you so much in advance.
I have the exact same error in my pe-bwa file for only four of 75 samples sequenced in the same batch.
When I run BBtool's repair.sh I end up with all reads in a singleton file and the r1 and r2 fixed files are empty. I've ran these samples before and they worked fine. Anyone know what this BBtools results mean?
Executing jgi.SplitPairsAndSingles [rp, in1=INPA_A1990_Tun_och_fer-READ1.fastq, in2=INPA_A1990_Tun_och_fer-READ2.fastq, out1=INPA_A1990_Tun_och_fer-READ1-fixed.fastq, out2=INPA_A1990_Tun_och_fer-READ2-fixed.fastq, outs=INPA_A1990_Tun_och_fer-singletons-repair.fastq, -Xmx12G]
Set INTERLEAVED to false Started output stream.
Input: 1937896 reads 281176225 bases. Result: 1937896 reads (100.00%) 281176225 bases (100.00%) Pairs: 0 reads (0.00%) 0 bases (0.00%) Singletons: 1937896 reads (100.00%) 281176225 bases (100.00%)
Time: 14.529 seconds. Reads Processed: 1937k 133.39k reads/sec Bases Processed: 281m 19.35m bases/sec
Use BBmap instead bbtools is no more available.
https://anaconda.org/bioconda/bbmap
Hi @tomaskopsa could you please let me know the command you used to sort with FASTQ-sort? I'm working with WGS read mapping and several of my raw reads present the same "Paired reads have different names" issue.
Thanks for your reply
Is there a way to disable bwa from throwing an error when the names of read 1 and read 2 do not match? I am working with a lot of data from the NCBI SRA, and in many cases the reads have names like "HWI-1KL117:327:C6CF1ACXX:8:1101:1319:1990_forward" and "HWI-1KL117:327:C6CF1ACXX:8:1101:1319:1990_reverse". The reads really are correctly paired, it is just that they are named with "_forward" and "_reverse". I'm hoping there is a way of disabling the error, since it would take a lot of computation to go through and strip the "_forward" and "_reverse" tags from hundreds of gigabytes of reads.
Downstream analysis programs will expect paired reads to have identical QNAME values. So something is going to have to strip those _forward
and _reverse
suffixes.
Your choices would be:
-
Encourage SRA to produce more standard FASTQ files.
-
BWA already strips
/1
and/2
. So you can patch your local version of BWA to also strip these two suffixes. -
Write a script to strip them or convert them to
/1
and/2
. If this is organised as a streaming filter in front of bwa's input, the added load will be trivial.
Thanks! Good to know, frustrating that the data comes in a non-standard format. Good idea about stripping them while streaming to bwa, I'll do that.
SRR19880797检查 [mem_sam_pe] paired reads have different names: "SRR19880797.5358018", "SRR19880797.10839728" [E::sam_parse1] CIGAR and query sequence are of different length [W::sam_read1] Parse error at line 9982028 [main_samview] truncated file. Mapping failed
samtools view -h ./SRR19880797/SRR19880797.bam |less +9982028 -SN
(参考)3370行 SRR19880797.1 65 chr8 143932417 60 100M chr22 20819568 0 TGGCGGTCATGTTGGTGTTGCGGTCGCTCCAGTCGAAGCCCACCTCCTCCTCCTCCTTCTCATTCAGCCACATTAGCTCCTTAGTGGCGGTTGCCACAAA FFFGFCFFGFFFGFFFFFFFFFFFGGFFGFFFEFFDFGFFFFFFGFFFFFGFGGFGGFF@GGGFGFFFFFBFFGAFFGFFFGGDAGFGDG+@D9@/?=59 NM:i:1 MD:Z:90C9 AS:i:95 XS:i:23 RG:Z:SRR19880797 (参考)3371行 SRR19880797.1 129 chr22 20819568 60 100M chr8 143932417 0 AGAGGGATTTTCTTCGCAGGGGAGCTTAACAGGGTCTTTCTCCTCTGCTCTTTCCCCAGTAGCCCAGGCCCACCTGAGAGATGCTGGACACACTGCTGGT GFDFFFFFF;FFF9FFFFGFFFBFFFFGFFFFFFFFFEFFFFFFFFFFFFFFFFFFFFFFEFFFFFFFFF>FFFGFFFEFFFFFFFF@FFDFFFFEECG: NM:i:0 MD:Z:100 AS:i:100 XS:i:20 RG:Z:SRR19880797
(报错前一行)9982027行 SRR19880797.5023186 81 chr22 22643043 0 100M chr3 126500710 0 AGCGAGGTGACCTGGGCTGAGTCCTGGGAATGGGAAGAGGTGGCAGGAAGGGGATCTGAGGAGGAGAACAGGGGGCCTGGTGGTCTGTGCTTCTTCCCAG FF;AFG@GGFFDGGFFE>DFFFFFFGGFGFGFGFFEFFGFEFGGGFFGGFGGFGFEEGGEGFGFF>GGGFFFFFGGFFFGGFFEGFFGFFFFFFFFFFGG NM:i:0 MD:Z:100 AS:i:100 XS:i:100 RG:Z:SRR19880797 (报错行)9982028行 SRR19880797.5023186 161 chr3 126500710 60 100M chr22 22643043 0 TCCTTGAACACAGCAGGGTTGGAGGCCATGAGGCTCTGGGCCTCCGTGAAGCTGAGCTGCACAGGGTAGTAGCCGCCATTGAACGGGTTGTGGCAGGATG FFFFFGDFFGFFEFFFFFFEFF@FFFFFEFFFFFFFFFFFFFGFDGFGF;FFGGEGFFGEFFFFFFFF>FFFFFFEFFFFGFFDFFG@FF<FFFDF=@FG NM:i:0 MD:Z:100 AS:i:100 XS:i:0 RG:Z:SRR19880797
less SRR19880797_sort.1.fastp.fastq.gz (参考)@SRR19880797.5023185 5023185/1 CTGTGGCCCTGTGCCAAACCTGGAGCAGCTGCCTTTAGAGGCCAGGAGGGCTACTTCCCGTTTCCTGAGCACTGTCCCTCTGTCTGCAGGAGTGCTGCTG + FF@FFFFFFFFGFFFFFFFGAFFEFFFFFFGFFFFGCFFGFEFGGF<FF>>GGFFFFGGFFGFFFGGBGGFGGGFFGGFFGFFFBGFGFFFFFDFFFGFC (报错)@SRR19880797.5023186 5023186/1 CTGGGAAGAAGCACAGACCACCAGGCCCCCTGTTCTCCTCCTCAGATCCCCTTCCTGCCACCTCTTCCCATTCCCAGGACTCAGCCCAGGTCACCTCGCT + GGFFFFFFFFFFGFFGEFFGGFFFGGFFFFFGGG>FFGFGEGGEEFGFGGFGGFFGGGFEFGFFEFFGFGFGFGGFFFFFFD>EFFGGDFFGG@GFA;FF (参考)@SRR19880797.5023187 5023187/1 AGGACACGGTACAAAAGGGCAGCCAGGCAGGGTTGGAAGGTGGGGTCTGAGGGGTTTCCACCTGCCCTCTCCCATCCTTCCAGGTTTTGGCGGCAGATGG + F?FFFFGFF/FGFGFF>FFFFFFFFFFFFFFFFFFFFEFFFFEFFD@FFCFFEFFFFGGFFFFDFDFFFGFFFFFFFFFEFFGFBFGFFECFF:DFBFFF
less SRR19880797_sort.2.fastp.fastq.gz (参考)@SRR19880797.5023185 5023185/2 GGCTGGCCCAGCGCCAGCGTCGGAGCGCCGGCCCCCTCCCCGGGCCGCCCCCACCCAACCAGACCCTCCAGCGCGTGCCACCGGACCTCGTGTCCTAGAC + )<;7@CDB1B:AA=DCB3AE;D?>C:5=61?469@19+9*7&&>A'@4;)9&8?>&8E3>76*='(BB,>&<&2EC'4;?=9.4>+5 (报错)@SRR19880797.5023186 5023186/2 TCCTTGAACACAGCAGGGTTGGAGGCCATGAGGCTCTGGGCCTCCGTGAAGCTGAGCTGCACAGGGTAGTAGCCGCCATTGAACGGGTTGTGGCAGGATG + FFFFFGDFFGFFEFFFFFFEFF@FFFFFEFFFFFFFFFFFFFGFDGFGF;FFGGEGFFGEFFFFFFFF>FFFFFFEFFFFGFFDFFG@FF<FFFDF=@FG (参考)@SRR19880797.5023187 5023187/2 AAATTCCACAAGAGGGTCATTAAGTGTGATAGTGGAAATGCCCTAACCTCCACCCTTACTTCTCAAATATTCTAGCTATTGGAGATAAAGTACCATATAC + GFFFFFGFF?FGFFFFFFFGFGFGFGFFFFFGFGFFFFFGFFFGFFFFFF>GFFFFFFFFFFFGGFFFGFFFFCEFFGGFGFFFFFFFFFFFFGFGGFFF
what's wrong with my file?
@jingydz did you ever discover what was wrong with your file? I'm having a similar issue