salmon
salmon copied to clipboard
Error with bam files that has paired end reads + singletons
Hi,
I used merged genome + transcriptome file with Hisat2 to retrieve all reads in my samples, and then converted the bam files to R1. fastq, R2.fastq + singletons files. I then used Hisat2 with the three files and used the transcriptome fasta file to pull out reads which is now in bam format to be used in Salmon. When I tried using Salmon using the transcriptome fasta file and the bam file, I get this error:
WARNING: Detected suspicious pair --- The proper-pair statuses are inconsistent: read1 [HWI-7001326F:36:C7J2GANXX:5:1104:13209:80405] : proper-pair; mapped; matemapped
read2 : [HWI-7001326F:36:C7J2GANXX:5:1104:13209:80405] : no proper-pair; mapped; matemapped
[2017-07-07 10:46:02.309] [jointLog] [warning]
ERROR: Found unpaired read in a paired-end library. The read was marked as unpaired in sequencing (not just unmapped).The two ends of a paired-end read should be adjacent. Don't know how to proceed; exiting!
I tried to sort the bam files using samtools sort -n but still I get the above error. Any advice please? Thanks
Is the number of reads in both R1.fastq and R2.fastq are same?
Yes the no. of reads in both fastq files are exactly the same.
Can you please paste the command you used to invoke salmon. Just want to make sure everything looks right.
salmon quant -p 6 -t Trinity.fna -l A -a 01.mapped.bam -o quants_test
Hi @bioinfo17 ,
If I understand it right you did the following, correct me I am wrong:
(1) Made one reference file. concat(Genome.fa
,txptome.fa
)
(2) You map reads set say R
to the above reference and extracted a subset of reads R1
(both-pair mapped) and R2
(single-end mapped).
(3)You mapped again R1
,R2
to txptome.fa using HISAT and generated bam to filter-out mapping of the reads only mapped to transcriptome? (I am not sure about this part)
Either way the error seems to be arising from non-matching flags of the read-pair entries in the bam
file. Can you try not including your singleton
set in the last-step of filtering, just to confirm if the set R2
is really the problem?
Hi k3yavi, yes you are correct regarding 1,2 & 3. Howver, at step 2, I generated mapped files as two separate files, *.R1.paired.fastq, *.R2.paired.fastq and individual *.singletons.fastq file. Used Hisat2 using -1, -2 and -U parameters with the respective files and generated bam files - used these files in Salmon then and encountered the error. I can run Hisat2 again without the -U option and see if Salmon works!!
In a separate run, I used transcriptome.fasta file in the 1st step and generated bam file using Hisat2. Then used bam file in Salmon using transcriptome.fasta in the -t flag. Error below:
WARNING: Detected suspicious pair --- The names are different: read1 : HWI-7001326F:36:C7J2GANXX:4:1302:8789:95937 read2 : HWI-7001326F:36:C7J2GANXX:4:2204:6152:63667 The proper-pair statuses are inconsistent: read1 [HWI-7001326F:36:C7J2GANXX:4:1302:8789:95937] : proper-pair; mapped; matemapped
read2 : [HWI-7001326F:36:C7J2GANXX:4:1302:8789:95937] : no proper-pair; mapped; matenot mapped
[2017-07-07 10:53:31.160] [jointLog] [warning]
Does the bam files need to be sorted before using Salmon? Thanks
ok I think (could be wrong) the problem is you are using stranded
and unstranded
library types at the same time i.e. -1
,-2
and -U
. I don't know if Salmon can handle stranded and unstranded data at the same time so you might have to remove data of one type.
Also I don't think sorting matters for salmon but the order of the paired-end read in both the files i.e. *_1.fq
and *_2.fq
or bam
file should be the same.
@bioinfo17 — The bam file does not need to be sorted (by position), but all of the alignment records for a given read should appear together in the BAM file. That is, you shouldn't have alignment records for read A, followed by records for read B, followed by records for read A again. Most aligners follow that convention (e.g. Bowtie2 & STAR will output the alignments for a read consecutively in the file). Above, it looks like the problem is that Salmon is seeing the first read of a pair read1 [HWI-7001326F:36:C7J2GANXX:4:1302:8789:95937] : proper-pair; mapped; matemapped
but the subsequent alignment record with the same read name disagrees that this read is aligned in a proper-pair (i.e. read2 : [HWI-7001326F:36:C7J2GANXX:4:1302:8789:95937] : no proper-pair; mapped; matenot mapped
).
Thanks everyone for your prompt responses - figured out where the problem was. Bam files doesn't need to be sorted and it worked perfect, even with using both stranded and unstranded libraries at the same time. Sam files worked perfect too and I believe I can skip converting the sam files to bam files.
Is it still OK to proceed ahead with sam/bam files generated from Hisat2 using -1, -2 and -U parameters?
Hi @bioinfo17, can you please elaborate? I am seeing the same error using STAR-aligned BAMs (to transcriptome) sorted with samtools 1.9 (default arguments, which the docs say means they are sorted as follows: 'Sort alignments by leftmost coordinates').
I am going to try sorting the bams by name (samtools argument -n) and re-trying salmon
Hi @radlinsky , The problem can be with the samtools sorted BAM. Salmon assumes the read's mates to be next to each other in the BAM while samtools follows a different scheme of reporting all the first mates together and then the second one, which creates issue with salmon.
Thanks everyone for your prompt responses - figured out where the problem was. Bam files doesn't need to be sorted and it worked perfect, even with using both stranded and unstranded libraries at the same time. Sam files worked perfect too and I believe I can skip converting the sam files to bam files.
Is it still OK to proceed ahead with sam/bam files generated from Hisat2 using -1, -2 and -U parameters?
I also have the same problem, how did you solve it?