salmon
salmon copied to clipboard
Q: how to deal with unstranded features?
Hello,
Thank you very much for this great tool salmon. I currently have a question about how to use it for my data.
First of all, the libtype should be ISR. Then some (not all) of the transcripts that I interested in is unstranded (actually they are on only one strand, but we don't know which one). What should I do for this?
- assuming they are on + strand, and get the sequences of transcripts. Then run salmon with libtype IU.
- assuming they are on + strand, and get the sequences of transcripts. Then run salmon with libtype ISR and set
--incompatPrior
a non-zero value (but what value is suitable?). - get 2 sequences for each of those transcripts, one for + strand and one for - strand. Then run salmon with libtype ISR. Finally sum the counts of the 2 sequences.
Which way is more rational, or is there any better way to do?
BTW, the changelog of v0.10.2 says:
The new behavior is equivalent to running with the option
--incompatPrior 0
"
but the doc still says:
Note that Salmon sets this value, by default, to a small but non-zero probability. This means that if an incompatible mapping is the only mapping for a fragment, Salmon will still assign this fragment to the transcript.
Which one the the current default behavior?
Hi @qifei9,
Thanks for the question and for pointing out the need for update in the docs. Regarding your first question, both approaches (3) and (2) seem reasonable to me. I would not try approach (1) as this will eliminate the benefit of the stranded library for the targets where you do know the orientation. For approach (2) , I'd either use --validateMappings
or at least set --rangeFactorizationBins 4
(the former implies the latter). As for what value to set for --incompatPrior
, the effect should be reasonably robust across a range of values, the question is how unlikely a priori would you expect a mapping not in ISR
orientation to be if you also observed a mapping in ISR
... probably very unlikely (you could try e.g. 1e-10 or some such). Approach 3 is also also reasonable, though what you might consider doing is looking at the abundances for these opposite strands of the same sequence post quantification --- you should generally see that one of the two has a non-zero expression, or at least one orientation should have a much higher expression than the other (for expressed transcripts, at least, this might give you evidence as to the true strand of origin).
Regarding your second point, the changelog is correct. In recent versions of salmon, --incompatPrior
is 0 by default. We'll update the documentation accordingly.
@rob-p Many thanks for you suggestions. I would try approach (3) first. Thanks!
Hi @rob-p, I believe the docs for --incompatPrior
still need to be updated to reflect the default behaviour.