salmon
salmon copied to clipboard
Guidance on Minimap2 Settings for Quantification of ONT Reads in Alignment Mode
I'm looking for some guidance on recommendations / best practices for quantifying direct RNA / cDNA Nanopore reads using Salmon. It is my understanding that the selective alignment algorithm Salmon employs is not well-suited for long reads (#602), and therefore the software needs to be run in alignment mode for accurate counting. My main question then concerns the optimal parameters for the upstream alignment step. The ONT community seems to have settled on using minimap2 for this, but beyond that the guidance gets a bit murky...
The minimap2 documentation suggests the following command for mapping long RNA reads:
minimap2 -ax splice:hq -uf ref.fa reads.fq > aln.sam
This approach seems to employ a splicing aware algorithm against a genomic reference, using canonical splicing signals to help map the transcripts. However, this method doesn't seem to be applicable to Salmon given the requirement that the reads are aligned directly to the transcriptome (hence the need to account for splicing with '-ac splice' is lost). An alternative approach I've seen (i.e., the one used in ONT's own DGE pipeline) is to use minimap2 to align to the transcriptome reference but to retain a large number of secondary mappings (-N 100 in minimap2):
minimap2 -ax map-ont -N 100 transcriptome.fa reads.fq
This makes more sense in terms of the -ax preset used, but I guess I'm just wondering then what the optimal input for Salmon would be in order to get the most accurate count data? I know secondary mappings are important for the algorithm to calculate uncertainty / maximum likelihood, but is there an recommend number of these to retain? The logic behind allowing for a high number of secondary alignments when using a transcriptome reference is to account for the high similarity among isoforms. From a high-level view I could see how this might be problematic though, depending on how Salmon actually uses the alternate mappings (i.e., is it just for the statistics or does it affect the counts as well?).
I've also seen groups toying with adjusting the -p setting in minimap2 which sets the minimal ratio of the secondary to primary alignment score that is allowed in order to report the secondary mapping. Surveying the forums and discussion boards, values of -N ranging from the default of 5 to 100 and of -p ranging from 0 to 1, (i.e., anything) seem to be acceptable. Given this ambiguity, I figured going to the 'source' and asking the creators what Salmon actually wants might be beneficial, so if yall have done any testing or have recommendations I'd very appreciative.
Sorry to call you out directly @rob-p, but do you have any quick thoughts on this or know someone that might?
@NanoCoreUSA I am in the same boat now. What did you settle on?
@mousepixels Apologies for slow reply, I found myself circling back to this same issue with another project and thought I'd update the thread.
Seems like there was some guidance all along regarding dealing with ONT data. See this link .
To summarize, looks like they advise -N 100 -p 1.0 for minimap2, which coincidently is what I have been doing as well. Hope that's helpful if you haven't already come up with a strategy.