salmon
salmon copied to clipboard
Issue using Salmon quant with BAM files
Hello We are currently trying to use Salmon quant mode with BAM files that were generated from another lab's work. We have continued to get the error below and are unsure what the problem is. Our original command used the libtype A parameter which gave us a similar warning to the one below. We then sorted the BAM files by name using samtools sort -n and tried to run the command again and got the same error. We have used a myriad of different libtypes and continue to get the same error. We have a reference transcriptome generated from a de novo transcriptome assembly that the previous lab produced.
WARNING: Detected suspicious pair --- The names are different: read1 : D61655M1:276:D10YJACXX:3:1114:17825:17511 read2 : D61655M1:276:D10YJACXX:3:1114:17825:18115 The proper-pair statuses are inconsistent: read1 [D61655M1:276:D10YJACXX:3:1114:17825:17511] : proper-pair; mapped; matemapped
read2 : [D61655M1:276:D10YJACXX:3:1114:17825:17511] : no proper-pair; mapped; matenot mapped
We have seen that others have had similar issue but haven't found a thread that mentions what the user did to remedy the problem. Any suggestions would be appreciated! We are using salmon V0.14.1.
The assumptions made by salmon regarding the BAM files are the same as those made by RSEM (with the exception that salmon will accept alignments with insertions or deletions in the CIGAR strings). If STAR is run with the --quantMode TranscriptomeSAM
flag (and without sorting the output BAM file), it should work as is. Any idea how STAR was run here? Presumably you can get the command line from the BAM header. @k3yavi : any suggestion on methods to properly collate the reads?
Hey Rob!
Thanks for the reply! Within the BAM files there is no indication that STAR was used nor did they state they used it.
Any reasons why you want to still use 0.14.1 - probably you should upgrade to 1.3.0
This is not to suggest that upgrading will fix your issue - salmon has many new features since v1 (@rob-p can allude to it), but IMO you should be using salmon >= 1.2
We have seen that others have had similar issue but haven't found a thread that mentions what the user did to remedy the problem. Any suggestions would be appreciated! We are using salmon V0.14.1.
Hey, I'm having the same kind of problem.
I aligned my PE reads against the transcriptome using BWA-mem and then sorted them by coordinates (as a regular procedure). I know Salmon assumes the alignments are not sorted, so I shuffled these bam files, and then run salmon quant
.
Here are the errors I got in a number of trials:
Fresh installation of Salmon
conda create --name salmon -c bioconda salmon
conda activate salmon
1. Shuffling a bam file with samtools collate
samtools collate \
-@ 40 \
-o SRR3212847.Aligned.Shuffled.bam \
SRR3212847.Aligned.SortedByCoord.bam
salmon quant \
-t mRNA.fasta \
-p 20 \
-l A \
-a SRR3212847.Aligned.Shuffled.bam \
-o SRR3212847.Aligned.Shuffled
Version Info: This is the most recent version of salmon.
# salmon (alignment-based) v1.4.0
# [ program ] => salmon
# [ command ] => quant
# [ targets ] => { mRNA.fasta }
# [ threads ] => { 20 }
# [ libType ] => { A }
# [ alignments ] => { SRR3212847.Aligned.Shuffled.bam }
# [ output ] => { SRR3212847.Aligned.Shuffled }
Logs will be written to SRR3212847.Aligned.Shuffled/logs
[2021-01-08 12:43:44.680] [jointLog] [info] setting maxHashResizeThreads to 20
[2021-01-08 12:43:44.680] [jointLog] [info] Fragment incompatibility prior below threshold. Incompatible fragments will be ignored.
Library format { type:paired end, relative orientation:inward, strandedness:unstranded }
[2021-01-08 12:43:44.711] [jointLog] [info] numQuantThreads = 14
parseThreads = 6
Checking that provided alignment files have consistent headers . . . done
Populating targets from aln = "SRR3212847.Aligned.Shuffled.bam", fasta = "mRNA.fasta" . . .done
processed 0 reads in current roundSegmentation fault (core dumped)
2. Shuffling a headless bam file with samtools collate
(I think I saw something about the bam's header in another thread dealing with this issue)
samtools view \
-b \
-@ 40 \
-o SRR3212847.Aligned.SortedByCoord.NoHeader.bam \
SRR3212847.Aligned.SortedByCoord.bam
samtools collate \
-@ 40 \
-o SRR3212847.Aligned.Shuffled.NoHeader.bam \
SRR3212847.Aligned.SortedByCoord.NoHeader.bam
salmon quant \
-t mRNA.fasta \
-p 20 \
-l A \
-a SRR3212847.Aligned.Shuffled.NoHeader.bam \
-o SRR3212847.Aligned.Shuffled.NoHeader
....
[2021-01-08 12:42:10.700] [jointLog] [warning]
WARNING: Detected suspicious pair ---
The names are different:
read1 : SRR3212847.24133171
read2 : SRR3212847.33911054
The proper-pair statuses are inconsistent:
read1 [SRR3212847.24133171] : no proper-pair; not mapped; matenot mapped
read2 : [SRR3212847.24133171] : proper-pair; mapped; matemapped
[2021-01-08 12:42:10.700] [jointLog] [warning]
WARNING: Detected suspicious pair ---
The names are different:
read1 : SRR3212847.33911054
read2 : SRR3212847.30781941
Segmentation fault (core dumped)
3. Sorting with samtools sort -n
samtools sort \
-@ 40 \
-n \
-o SRR3212847.Aligned.SortedByName.bam \
SRR3212847.Aligned.SortedByCoord.bam
salmon quant \
-t mRNA.fasta \
-p 20 \
-l A \
-a SRR3212847.Aligned.SortedByName.bam \
-o SRR3212847.Aligned.SortedByName
Version Info: This is the most recent version of salmon.
# salmon (alignment-based) v1.4.0
# [ program ] => salmon
# [ command ] => quant
# [ targets ] => { mRNA.fasta }
# [ threads ] => { 20 }
# [ libType ] => { A }
# [ alignments ] => { SRR3212847.Aligned.SortedByName.bam }
# [ output ] => { SRR3212847.Aligned.SortedByName }
Logs will be written to SRR3212847.Aligned.SortedByName/logs
[2021-01-08 13:02:04.845] [jointLog] [info] setting maxHashResizeThreads to 20
[2021-01-08 13:02:04.845] [jointLog] [info] Fragment incompatibility prior below threshold. Incompatible fragments will be ignored.
Library format { type:paired end, relative orientation:inward, strandedness:unstranded }
[2021-01-08 13:02:04.878] [jointLog] [info] numQuantThreads = 14
parseThreads = 6
Checking that provided alignment files have consistent headers . . . done
Populating targets from aln = "SRR3212847.Aligned.SortedByName.bam", fasta = "mRNA.fasta" . . .done
processed 0 reads in current roundSegmentation fault (core dumped)
(Which is the same as the 1st error. Actually, each time I re-run those two errors switched.)
I tried running Salmon on the sorted-by-coordinates bam, and it didn't fail:
nohup salmon quant \
-t mRNA.fasta \
-p 20 \
-l A \
-a SRR3212847.Aligned.SortedByCoord.bam \
-o SRR3212847.Aligned.SortedByCoord \
> SRR3212847.Aligned.SortedByCoord.out &
Even so, SRR3212847.Aligned.SortedByCoord.out
contained ~3.5GB worth of the warnings above.
Any help would be much appreciated. Thanks!
Hey, I'm having the same kind of problem.
I aligned my PE reads against the transcriptome using BWA-mem and then sorted them by coordinates (as a regular procedure). I know Salmon assumes the alignments are not sorted, so I shuffled these bam files, and then run
salmon quant
. Here are the errors I got in a number of trials:Fresh installation of Salmon
conda create --name salmon -c bioconda salmon conda activate salmon
1. Shuffling a bam file with
samtools collate
samtools collate \ -@ 40 \ -o SRR3212847.Aligned.Shuffled.bam \ SRR3212847.Aligned.SortedByCoord.bam salmon quant \ -t mRNA.fasta \ -p 20 \ -l A \ -a SRR3212847.Aligned.Shuffled.bam \ -o SRR3212847.Aligned.Shuffled
Version Info: This is the most recent version of salmon. # salmon (alignment-based) v1.4.0 # [ program ] => salmon # [ command ] => quant # [ targets ] => { mRNA.fasta } # [ threads ] => { 20 } # [ libType ] => { A } # [ alignments ] => { SRR3212847.Aligned.Shuffled.bam } # [ output ] => { SRR3212847.Aligned.Shuffled } Logs will be written to SRR3212847.Aligned.Shuffled/logs [2021-01-08 12:43:44.680] [jointLog] [info] setting maxHashResizeThreads to 20 [2021-01-08 12:43:44.680] [jointLog] [info] Fragment incompatibility prior below threshold. Incompatible fragments will be ignored. Library format { type:paired end, relative orientation:inward, strandedness:unstranded } [2021-01-08 12:43:44.711] [jointLog] [info] numQuantThreads = 14 parseThreads = 6 Checking that provided alignment files have consistent headers . . . done Populating targets from aln = "SRR3212847.Aligned.Shuffled.bam", fasta = "mRNA.fasta" . . .done processed 0 reads in current roundSegmentation fault (core dumped)
2. Shuffling a headless bam file with
samtools collate
(I think I saw something about the bam's header in another thread dealing with this issue)
samtools view \ -b \ -@ 40 \ -o SRR3212847.Aligned.SortedByCoord.NoHeader.bam \ SRR3212847.Aligned.SortedByCoord.bam samtools collate \ -@ 40 \ -o SRR3212847.Aligned.Shuffled.NoHeader.bam \ SRR3212847.Aligned.SortedByCoord.NoHeader.bam salmon quant \ -t mRNA.fasta \ -p 20 \ -l A \ -a SRR3212847.Aligned.Shuffled.NoHeader.bam \ -o SRR3212847.Aligned.Shuffled.NoHeader
.... [2021-01-08 12:42:10.700] [jointLog] [warning] WARNING: Detected suspicious pair --- The names are different: read1 : SRR3212847.24133171 read2 : SRR3212847.33911054 The proper-pair statuses are inconsistent: read1 [SRR3212847.24133171] : no proper-pair; not mapped; matenot mapped read2 : [SRR3212847.24133171] : proper-pair; mapped; matemapped [2021-01-08 12:42:10.700] [jointLog] [warning] WARNING: Detected suspicious pair --- The names are different: read1 : SRR3212847.33911054 read2 : SRR3212847.30781941 Segmentation fault (core dumped)
3. Sorting with
samtools sort -n
samtools sort \ -@ 40 \ -n \ -o SRR3212847.Aligned.SortedByName.bam \ SRR3212847.Aligned.SortedByCoord.bam salmon quant \ -t mRNA.fasta \ -p 20 \ -l A \ -a SRR3212847.Aligned.SortedByName.bam \ -o SRR3212847.Aligned.SortedByName
Version Info: This is the most recent version of salmon. # salmon (alignment-based) v1.4.0 # [ program ] => salmon # [ command ] => quant # [ targets ] => { mRNA.fasta } # [ threads ] => { 20 } # [ libType ] => { A } # [ alignments ] => { SRR3212847.Aligned.SortedByName.bam } # [ output ] => { SRR3212847.Aligned.SortedByName } Logs will be written to SRR3212847.Aligned.SortedByName/logs [2021-01-08 13:02:04.845] [jointLog] [info] setting maxHashResizeThreads to 20 [2021-01-08 13:02:04.845] [jointLog] [info] Fragment incompatibility prior below threshold. Incompatible fragments will be ignored. Library format { type:paired end, relative orientation:inward, strandedness:unstranded } [2021-01-08 13:02:04.878] [jointLog] [info] numQuantThreads = 14 parseThreads = 6 Checking that provided alignment files have consistent headers . . . done Populating targets from aln = "SRR3212847.Aligned.SortedByName.bam", fasta = "mRNA.fasta" . . .done processed 0 reads in current roundSegmentation fault (core dumped)
(Which is the same as the 1st error. Actually, each time I re-run those two errors switched.)
I tried running Salmon on the sorted-by-coordinates bam, and it didn't fail:
nohup salmon quant \ -t mRNA.fasta \ -p 20 \ -l A \ -a SRR3212847.Aligned.SortedByCoord.bam \ -o SRR3212847.Aligned.SortedByCoord \ > SRR3212847.Aligned.SortedByCoord.out &
Even so,
SRR3212847.Aligned.SortedByCoord.out
contained ~3.5GB worth of the warnings above.Any help would be much appreciated. Thanks!
hello,i have the same problem,thanks for your answer. Your SRR3212847.Aligned.SortedByCoord.out contained ~3.5GB worth of the warnings above, What is the warning message? And in my log file,the warning as follow:
Can i ignore these warnings?
Any update on this? I am having the same error