salmon icon indicating copy to clipboard operation
salmon copied to clipboard

Q: how to deal with unstranded features?

Open qifei9 opened this issue 6 years ago • 3 comments

Hello,

Thank you very much for this great tool salmon. I currently have a question about how to use it for my data.

First of all, the libtype should be ISR. Then some (not all) of the transcripts that I interested in is unstranded (actually they are on only one strand, but we don't know which one). What should I do for this?

  1. assuming they are on + strand, and get the sequences of transcripts. Then run salmon with libtype IU.
  2. assuming they are on + strand, and get the sequences of transcripts. Then run salmon with libtype ISR and set --incompatPrior a non-zero value (but what value is suitable?).
  3. get 2 sequences for each of those transcripts, one for + strand and one for - strand. Then run salmon with libtype ISR. Finally sum the counts of the 2 sequences.

Which way is more rational, or is there any better way to do?


BTW, the changelog of v0.10.2 says:

The new behavior is equivalent to running with the option --incompatPrior 0"

but the doc still says:

Note that Salmon sets this value, by default, to a small but non-zero probability. This means that if an incompatible mapping is the only mapping for a fragment, Salmon will still assign this fragment to the transcript.

Which one the the current default behavior?

qifei9 avatar Jan 02 '19 09:01 qifei9

Hi @qifei9,

Thanks for the question and for pointing out the need for update in the docs. Regarding your first question, both approaches (3) and (2) seem reasonable to me. I would not try approach (1) as this will eliminate the benefit of the stranded library for the targets where you do know the orientation. For approach (2) , I'd either use --validateMappings or at least set --rangeFactorizationBins 4 (the former implies the latter). As for what value to set for --incompatPrior, the effect should be reasonably robust across a range of values, the question is how unlikely a priori would you expect a mapping not in ISR orientation to be if you also observed a mapping in ISR ... probably very unlikely (you could try e.g. 1e-10 or some such). Approach 3 is also also reasonable, though what you might consider doing is looking at the abundances for these opposite strands of the same sequence post quantification --- you should generally see that one of the two has a non-zero expression, or at least one orientation should have a much higher expression than the other (for expressed transcripts, at least, this might give you evidence as to the true strand of origin).

Regarding your second point, the changelog is correct. In recent versions of salmon, --incompatPrior is 0 by default. We'll update the documentation accordingly.

rob-p avatar Jan 02 '19 17:01 rob-p

@rob-p Many thanks for you suggestions. I would try approach (3) first. Thanks!

qifei9 avatar Jan 03 '19 03:01 qifei9

Hi @rob-p, I believe the docs for --incompatPrior still need to be updated to reflect the default behaviour.

prmac avatar Nov 30 '22 09:11 prmac