salmon
salmon copied to clipboard
Lower mapping percentages
To whom it may concern,
I have been using Salmon to quantify RNA-seq data using a new long-read RNA sequencing-based GTF I have developed. When I run Salmon on RNA-seq samples from TCGA (read length = 50 bp, kmer length = 21), I tend to get ~95% of reads mapping to my transcriptome.
However, when I use the same script to run my pipeline on in-house sequenced data (read length = 150 bp, kmer length = 21), I am getting only around 80-85% of reads mapping to my transcriptome. According to STAR, >90% (usually >95%) of these same in-house samples mapped to the genome. Why am I getting lower mapping rates? Could read length have something to do with it? Thanks so much for any advice or guidance you can provide
Script: 5_runSalmon.sh.zip (The only difference between my TCGA and in-house runs are that for TCGA I use "-i IU" and for my in-house samples I use "-i ISR" due to differences in the strandedness of the prep protocols)
Yours most sincerely, Ryan Englander
Hi @ryanpe13002,
This is totally expected. The primary reason is that the overall mapping fraction reported by STAR is mapping to the entire genome (not all of those reads will be used to quantify expression), while salmon is reporting the mapping rate only to the annotated transcriptome. Thus, there will often be a difference in the mapping rates reported by these tools.
Best, Rob
Hi Rob,
Entirely understand the discrepancy between STAR and Salmon mapping rates, but why was the mapping rate between TCGA and my in-house samples so different? Further, what is a canonically "good" mapping rate, i.e., at what % mapped should I start to get concerned that my sample quality might not be that great?
Thanks so much, Ryan
Hi Ryan,
Sure; there's not a canonically "good" threshold, but I would consider things in the 80-85% mapping to the annotate transcriptome range to be rather good. I'd start paying attention if things dip into e.g. the 70s (even that may be OK, but then I'd do other diagnostics on samples). This is, of course, anecdotal and it's hard to say too much from the mapping rate alone.
You can see how much of an effect the library type might be having by looking at the lib_format_counts.json
file to see how many reads map in a manner incompatible with the ISR
library type you're using.
Best, Rob
Hi Rob,
Thank you so much for the explanation! Most of the samples were 80-85%, but some did dip as low as 75%; I ran FASTQC on all the samples before running the pipeline and they all looked fine (quite good quality, in fact). I also checked the lib_format_counts.json
file for a few of the "problem" samples and it looks as you'd expect (~99% of reads map consistent with ISR orientation).
Are there other diagnostics you might recommend running?
Thanks so much, Ryan