salmon icon indicating copy to clipboard operation
salmon copied to clipboard

Issue using Salmon quant with BAM files

Open taylor-murphy2020 opened this issue 3 years ago • 6 comments

Hello We are currently trying to use Salmon quant mode with BAM files that were generated from another lab's work. We have continued to get the error below and are unsure what the problem is. Our original command used the libtype A parameter which gave us a similar warning to the one below. We then sorted the BAM files by name using samtools sort -n and tried to run the command again and got the same error. We have used a myriad of different libtypes and continue to get the same error. We have a reference transcriptome generated from a de novo transcriptome assembly that the previous lab produced.

WARNING: Detected suspicious pair --- The names are different: read1 : D61655M1:276:D10YJACXX:3:1114:17825:17511 read2 : D61655M1:276:D10YJACXX:3:1114:17825:18115 The proper-pair statuses are inconsistent: read1 [D61655M1:276:D10YJACXX:3:1114:17825:17511] : proper-pair; mapped; matemapped

read2 : [D61655M1:276:D10YJACXX:3:1114:17825:17511] : no proper-pair; mapped; matenot mapped

We have seen that others have had similar issue but haven't found a thread that mentions what the user did to remedy the problem. Any suggestions would be appreciated! We are using salmon V0.14.1.

taylor-murphy2020 avatar Nov 04 '20 16:11 taylor-murphy2020

The assumptions made by salmon regarding the BAM files are the same as those made by RSEM (with the exception that salmon will accept alignments with insertions or deletions in the CIGAR strings). If STAR is run with the --quantMode TranscriptomeSAM flag (and without sorting the output BAM file), it should work as is. Any idea how STAR was run here? Presumably you can get the command line from the BAM header. @k3yavi : any suggestion on methods to properly collate the reads?

rob-p avatar Nov 04 '20 16:11 rob-p

Hey Rob!

Thanks for the reply! Within the BAM files there is no indication that STAR was used nor did they state they used it.

taylor-murphy2020 avatar Nov 05 '20 02:11 taylor-murphy2020

Any reasons why you want to still use 0.14.1 - probably you should upgrade to 1.3.0

This is not to suggest that upgrading will fix your issue - salmon has many new features since v1 (@rob-p can allude to it), but IMO you should be using salmon >= 1.2

We have seen that others have had similar issue but haven't found a thread that mentions what the user did to remedy the problem. Any suggestions would be appreciated! We are using salmon V0.14.1.

tamuanand avatar Nov 23 '20 20:11 tamuanand

Hey, I'm having the same kind of problem.

I aligned my PE reads against the transcriptome using BWA-mem and then sorted them by coordinates (as a regular procedure). I know Salmon assumes the alignments are not sorted, so I shuffled these bam files, and then run salmon quant. Here are the errors I got in a number of trials:

Fresh installation of Salmon

conda create --name salmon -c bioconda salmon
conda activate salmon

1. Shuffling a bam file with samtools collate

samtools collate \
-@ 40 \
-o SRR3212847.Aligned.Shuffled.bam \
SRR3212847.Aligned.SortedByCoord.bam

salmon quant \
-t mRNA.fasta \
-p 20 \
-l A \
-a SRR3212847.Aligned.Shuffled.bam \
-o SRR3212847.Aligned.Shuffled 
Version Info: This is the most recent version of salmon.
# salmon (alignment-based) v1.4.0
# [ program ] => salmon 
# [ command ] => quant 
# [ targets ] => { mRNA.fasta }
# [ threads ] => { 20 }
# [ libType ] => { A }
# [ alignments ] => { SRR3212847.Aligned.Shuffled.bam }
# [ output ] => { SRR3212847.Aligned.Shuffled }
Logs will be written to SRR3212847.Aligned.Shuffled/logs
[2021-01-08 12:43:44.680] [jointLog] [info] setting maxHashResizeThreads to 20
[2021-01-08 12:43:44.680] [jointLog] [info] Fragment incompatibility prior below threshold.  Incompatible fragments will be ignored.
Library format { type:paired end, relative orientation:inward, strandedness:unstranded }
[2021-01-08 12:43:44.711] [jointLog] [info] numQuantThreads = 14
parseThreads = 6
Checking that provided alignment files have consistent headers . . . done
Populating targets from aln = "SRR3212847.Aligned.Shuffled.bam", fasta = "mRNA.fasta" . . .done

processed 0 reads in current roundSegmentation fault (core dumped)

2. Shuffling a headless bam file with samtools collate

(I think I saw something about the bam's header in another thread dealing with this issue)

samtools view \
-b \
-@ 40 \
-o SRR3212847.Aligned.SortedByCoord.NoHeader.bam \
SRR3212847.Aligned.SortedByCoord.bam

samtools collate \
-@ 40 \
-o SRR3212847.Aligned.Shuffled.NoHeader.bam \
SRR3212847.Aligned.SortedByCoord.NoHeader.bam

salmon quant \
-t mRNA.fasta \
-p 20 \
-l A \
-a SRR3212847.Aligned.Shuffled.NoHeader.bam \
-o SRR3212847.Aligned.Shuffled.NoHeader

....


[2021-01-08 12:42:10.700] [jointLog] [warning] 

WARNING: Detected suspicious pair --- 
        The names are different:
        read1 : SRR3212847.24133171
        read2 : SRR3212847.33911054
        The proper-pair statuses are inconsistent:
read1 [SRR3212847.24133171] : no proper-pair; not mapped; matenot mapped

read2 : [SRR3212847.24133171] : proper-pair; mapped; matemapped

[2021-01-08 12:42:10.700] [jointLog] [warning] 

WARNING: Detected suspicious pair --- 
        The names are different:
        read1 : SRR3212847.33911054
        read2 : SRR3212847.30781941

Segmentation fault (core dumped)

3. Sorting with samtools sort -n

samtools sort \
-@ 40 \
-n \
-o SRR3212847.Aligned.SortedByName.bam \
SRR3212847.Aligned.SortedByCoord.bam

salmon quant \
-t mRNA.fasta \
-p 20 \
-l A \
-a SRR3212847.Aligned.SortedByName.bam \
-o SRR3212847.Aligned.SortedByName
Version Info: This is the most recent version of salmon.
# salmon (alignment-based) v1.4.0
# [ program ] => salmon 
# [ command ] => quant 
# [ targets ] => { mRNA.fasta }
# [ threads ] => { 20 }
# [ libType ] => { A }
# [ alignments ] => { SRR3212847.Aligned.SortedByName.bam }
# [ output ] => { SRR3212847.Aligned.SortedByName }
Logs will be written to SRR3212847.Aligned.SortedByName/logs
[2021-01-08 13:02:04.845] [jointLog] [info] setting maxHashResizeThreads to 20
[2021-01-08 13:02:04.845] [jointLog] [info] Fragment incompatibility prior below threshold.  Incompatible fragments will be ignored.
Library format { type:paired end, relative orientation:inward, strandedness:unstranded }
[2021-01-08 13:02:04.878] [jointLog] [info] numQuantThreads = 14
parseThreads = 6
Checking that provided alignment files have consistent headers . . . done
Populating targets from aln = "SRR3212847.Aligned.SortedByName.bam", fasta = "mRNA.fasta" . . .done

processed 0 reads in current roundSegmentation fault (core dumped)

(Which is the same as the 1st error. Actually, each time I re-run those two errors switched.)

I tried running Salmon on the sorted-by-coordinates bam, and it didn't fail:

nohup salmon quant \
-t mRNA.fasta \
-p 20 \
-l A \
-a SRR3212847.Aligned.SortedByCoord.bam \
-o SRR3212847.Aligned.SortedByCoord \
> SRR3212847.Aligned.SortedByCoord.out &

Even so, SRR3212847.Aligned.SortedByCoord.out contained ~3.5GB worth of the warnings above.

Any help would be much appreciated. Thanks!

KaparaNewbie avatar Jan 08 '21 12:01 KaparaNewbie

Hey, I'm having the same kind of problem.

I aligned my PE reads against the transcriptome using BWA-mem and then sorted them by coordinates (as a regular procedure). I know Salmon assumes the alignments are not sorted, so I shuffled these bam files, and then run salmon quant. Here are the errors I got in a number of trials:

Fresh installation of Salmon

conda create --name salmon -c bioconda salmon
conda activate salmon

1. Shuffling a bam file with samtools collate

samtools collate \
-@ 40 \
-o SRR3212847.Aligned.Shuffled.bam \
SRR3212847.Aligned.SortedByCoord.bam

salmon quant \
-t mRNA.fasta \
-p 20 \
-l A \
-a SRR3212847.Aligned.Shuffled.bam \
-o SRR3212847.Aligned.Shuffled 
Version Info: This is the most recent version of salmon.
# salmon (alignment-based) v1.4.0
# [ program ] => salmon 
# [ command ] => quant 
# [ targets ] => { mRNA.fasta }
# [ threads ] => { 20 }
# [ libType ] => { A }
# [ alignments ] => { SRR3212847.Aligned.Shuffled.bam }
# [ output ] => { SRR3212847.Aligned.Shuffled }
Logs will be written to SRR3212847.Aligned.Shuffled/logs
[2021-01-08 12:43:44.680] [jointLog] [info] setting maxHashResizeThreads to 20
[2021-01-08 12:43:44.680] [jointLog] [info] Fragment incompatibility prior below threshold.  Incompatible fragments will be ignored.
Library format { type:paired end, relative orientation:inward, strandedness:unstranded }
[2021-01-08 12:43:44.711] [jointLog] [info] numQuantThreads = 14
parseThreads = 6
Checking that provided alignment files have consistent headers . . . done
Populating targets from aln = "SRR3212847.Aligned.Shuffled.bam", fasta = "mRNA.fasta" . . .done

processed 0 reads in current roundSegmentation fault (core dumped)

2. Shuffling a headless bam file with samtools collate

(I think I saw something about the bam's header in another thread dealing with this issue)

samtools view \
-b \
-@ 40 \
-o SRR3212847.Aligned.SortedByCoord.NoHeader.bam \
SRR3212847.Aligned.SortedByCoord.bam

samtools collate \
-@ 40 \
-o SRR3212847.Aligned.Shuffled.NoHeader.bam \
SRR3212847.Aligned.SortedByCoord.NoHeader.bam

salmon quant \
-t mRNA.fasta \
-p 20 \
-l A \
-a SRR3212847.Aligned.Shuffled.NoHeader.bam \
-o SRR3212847.Aligned.Shuffled.NoHeader

....


[2021-01-08 12:42:10.700] [jointLog] [warning] 

WARNING: Detected suspicious pair --- 
        The names are different:
        read1 : SRR3212847.24133171
        read2 : SRR3212847.33911054
        The proper-pair statuses are inconsistent:
read1 [SRR3212847.24133171] : no proper-pair; not mapped; matenot mapped

read2 : [SRR3212847.24133171] : proper-pair; mapped; matemapped

[2021-01-08 12:42:10.700] [jointLog] [warning] 

WARNING: Detected suspicious pair --- 
        The names are different:
        read1 : SRR3212847.33911054
        read2 : SRR3212847.30781941

Segmentation fault (core dumped)

3. Sorting with samtools sort -n

samtools sort \
-@ 40 \
-n \
-o SRR3212847.Aligned.SortedByName.bam \
SRR3212847.Aligned.SortedByCoord.bam

salmon quant \
-t mRNA.fasta \
-p 20 \
-l A \
-a SRR3212847.Aligned.SortedByName.bam \
-o SRR3212847.Aligned.SortedByName
Version Info: This is the most recent version of salmon.
# salmon (alignment-based) v1.4.0
# [ program ] => salmon 
# [ command ] => quant 
# [ targets ] => { mRNA.fasta }
# [ threads ] => { 20 }
# [ libType ] => { A }
# [ alignments ] => { SRR3212847.Aligned.SortedByName.bam }
# [ output ] => { SRR3212847.Aligned.SortedByName }
Logs will be written to SRR3212847.Aligned.SortedByName/logs
[2021-01-08 13:02:04.845] [jointLog] [info] setting maxHashResizeThreads to 20
[2021-01-08 13:02:04.845] [jointLog] [info] Fragment incompatibility prior below threshold.  Incompatible fragments will be ignored.
Library format { type:paired end, relative orientation:inward, strandedness:unstranded }
[2021-01-08 13:02:04.878] [jointLog] [info] numQuantThreads = 14
parseThreads = 6
Checking that provided alignment files have consistent headers . . . done
Populating targets from aln = "SRR3212847.Aligned.SortedByName.bam", fasta = "mRNA.fasta" . . .done

processed 0 reads in current roundSegmentation fault (core dumped)

(Which is the same as the 1st error. Actually, each time I re-run those two errors switched.)

I tried running Salmon on the sorted-by-coordinates bam, and it didn't fail:

nohup salmon quant \
-t mRNA.fasta \
-p 20 \
-l A \
-a SRR3212847.Aligned.SortedByCoord.bam \
-o SRR3212847.Aligned.SortedByCoord \
> SRR3212847.Aligned.SortedByCoord.out &

Even so, SRR3212847.Aligned.SortedByCoord.out contained ~3.5GB worth of the warnings above.

Any help would be much appreciated. Thanks!

hello,i have the same problem,thanks for your answer. Your SRR3212847.Aligned.SortedByCoord.out contained ~3.5GB worth of the warnings above, What is the warning message? And in my log file,the warning as follow:

image Can i ignore these warnings?

MonaLiu421 avatar Dec 09 '22 02:12 MonaLiu421

Any update on this? I am having the same error

pabloacera avatar Mar 06 '24 02:03 pabloacera