salmon icon indicating copy to clipboard operation
salmon copied to clipboard

Error with bam files that has paired end reads + singletons

Open bioinfo17 opened this issue 7 years ago • 13 comments

Hi,

I used merged genome + transcriptome file with Hisat2 to retrieve all reads in my samples, and then converted the bam files to R1. fastq, R2.fastq + singletons files. I then used Hisat2 with the three files and used the transcriptome fasta file to pull out reads which is now in bam format to be used in Salmon. When I tried using Salmon using the transcriptome fasta file and the bam file, I get this error:

WARNING: Detected suspicious pair --- The proper-pair statuses are inconsistent: read1 [HWI-7001326F:36:C7J2GANXX:5:1104:13209:80405] : proper-pair; mapped; matemapped

read2 : [HWI-7001326F:36:C7J2GANXX:5:1104:13209:80405] : no proper-pair; mapped; matemapped

[2017-07-07 10:46:02.309] [jointLog] [warning]

ERROR: Found unpaired read in a paired-end library. The read was marked as unpaired in sequencing (not just unmapped).The two ends of a paired-end read should be adjacent. Don't know how to proceed; exiting!

I tried to sort the bam files using samtools sort -n but still I get the above error. Any advice please? Thanks

bioinfo17 avatar Jul 06 '17 22:07 bioinfo17

Is the number of reads in both R1.fastq and R2.fastq are same?

hiraksarkar avatar Jul 06 '17 22:07 hiraksarkar

Yes the no. of reads in both fastq files are exactly the same.

bioinfo17 avatar Jul 06 '17 23:07 bioinfo17

Can you please paste the command you used to invoke salmon. Just want to make sure everything looks right.

hiraksarkar avatar Jul 06 '17 23:07 hiraksarkar

salmon quant -p 6 -t Trinity.fna -l A -a 01.mapped.bam -o quants_test

bioinfo17 avatar Jul 06 '17 23:07 bioinfo17

Hi @bioinfo17 , If I understand it right you did the following, correct me I am wrong: (1) Made one reference file. concat(Genome.fa,txptome.fa) (2) You map reads set say R to the above reference and extracted a subset of reads R1(both-pair mapped) and R2 (single-end mapped). (3)You mapped again R1,R2 to txptome.fa using HISAT and generated bam to filter-out mapping of the reads only mapped to transcriptome? (I am not sure about this part)

Either way the error seems to be arising from non-matching flags of the read-pair entries in the bam file. Can you try not including your singleton set in the last-step of filtering, just to confirm if the set R2 is really the problem?

k3yavi avatar Jul 06 '17 23:07 k3yavi

Hi k3yavi, yes you are correct regarding 1,2 & 3. Howver, at step 2, I generated mapped files as two separate files, *.R1.paired.fastq, *.R2.paired.fastq and individual *.singletons.fastq file. Used Hisat2 using -1, -2 and -U parameters with the respective files and generated bam files - used these files in Salmon then and encountered the error. I can run Hisat2 again without the -U option and see if Salmon works!!

In a separate run, I used transcriptome.fasta file in the 1st step and generated bam file using Hisat2. Then used bam file in Salmon using transcriptome.fasta in the -t flag. Error below:

WARNING: Detected suspicious pair --- The names are different: read1 : HWI-7001326F:36:C7J2GANXX:4:1302:8789:95937 read2 : HWI-7001326F:36:C7J2GANXX:4:2204:6152:63667 The proper-pair statuses are inconsistent: read1 [HWI-7001326F:36:C7J2GANXX:4:1302:8789:95937] : proper-pair; mapped; matemapped

read2 : [HWI-7001326F:36:C7J2GANXX:4:1302:8789:95937] : no proper-pair; mapped; matenot mapped

[2017-07-07 10:53:31.160] [jointLog] [warning]

Does the bam files need to be sorted before using Salmon? Thanks

bioinfo17 avatar Jul 06 '17 23:07 bioinfo17

ok I think (could be wrong) the problem is you are using stranded and unstranded library types at the same time i.e. -1,-2 and -U. I don't know if Salmon can handle stranded and unstranded data at the same time so you might have to remove data of one type.

k3yavi avatar Jul 06 '17 23:07 k3yavi

Also I don't think sorting matters for salmon but the order of the paired-end read in both the files i.e. *_1.fq and *_2.fq or bam file should be the same.

k3yavi avatar Jul 06 '17 23:07 k3yavi

@bioinfo17 — The bam file does not need to be sorted (by position), but all of the alignment records for a given read should appear together in the BAM file. That is, you shouldn't have alignment records for read A, followed by records for read B, followed by records for read A again. Most aligners follow that convention (e.g. Bowtie2 & STAR will output the alignments for a read consecutively in the file). Above, it looks like the problem is that Salmon is seeing the first read of a pair read1 [HWI-7001326F:36:C7J2GANXX:4:1302:8789:95937] : proper-pair; mapped; matemapped but the subsequent alignment record with the same read name disagrees that this read is aligned in a proper-pair (i.e. read2 : [HWI-7001326F:36:C7J2GANXX:4:1302:8789:95937] : no proper-pair; mapped; matenot mapped).

rob-p avatar Jul 06 '17 23:07 rob-p

Thanks everyone for your prompt responses - figured out where the problem was. Bam files doesn't need to be sorted and it worked perfect, even with using both stranded and unstranded libraries at the same time. Sam files worked perfect too and I believe I can skip converting the sam files to bam files.

Is it still OK to proceed ahead with sam/bam files generated from Hisat2 using -1, -2 and -U parameters?

bioinfo17 avatar Jul 07 '17 00:07 bioinfo17

Hi @bioinfo17, can you please elaborate? I am seeing the same error using STAR-aligned BAMs (to transcriptome) sorted with samtools 1.9 (default arguments, which the docs say means they are sorted as follows: 'Sort alignments by leftmost coordinates').

I am going to try sorting the bams by name (samtools argument -n) and re-trying salmon

radlinsky avatar Jan 07 '19 21:01 radlinsky

Hi @radlinsky , The problem can be with the samtools sorted BAM. Salmon assumes the read's mates to be next to each other in the BAM while samtools follows a different scheme of reporting all the first mates together and then the second one, which creates issue with salmon.

k3yavi avatar Jan 07 '19 21:01 k3yavi

Thanks everyone for your prompt responses - figured out where the problem was. Bam files doesn't need to be sorted and it worked perfect, even with using both stranded and unstranded libraries at the same time. Sam files worked perfect too and I believe I can skip converting the sam files to bam files.

Is it still OK to proceed ahead with sam/bam files generated from Hisat2 using -1, -2 and -U parameters?

I also have the same problem, how did you solve it?

MonaLiu421 avatar Dec 09 '22 01:12 MonaLiu421