ShortStack icon indicating copy to clipboard operation
ShortStack copied to clipboard

Degraded mRNA

Open signor-molevol opened this issue 5 years ago • 1 comments

This may not be an appropriate question for the github issues section, however the Shortstack paper does say that for their mouse small RNA library many of the sRNAs had N dicer calls, and that this was likely due to contamination from degraded RNA.

My short RNA libraries from Drosophila are 87% N dicer calls using ShortStack.

Other papers recommend detecting contamination from degraded mRNA by looking for unusual coverage of an abundantly expressed gene such as GAPDH. While 'unusual coverage' is not defined, I did that and do not see what I would think of as unusual coverage.

(image included of reads from the bam file in the GAPDH region)

What other issues could cause mostly N calls? Is the metric that Shortstack uses to differentiate N and non-N very robust, and is there some way of evaluating it? Has it been evaluated anywhere?

Screen Shot 2019-03-18 at 1 14 43 PM

signor-molevol avatar Mar 18 '19 20:03 signor-molevol

So, sorry to have ignored this for over a year ! Basically the N call is when a cluster has less than 80% of it's reads in the size range defined by the dicermin and dicermax options ... these are set at 20-24 nts by default, which makes sense for plant small RNAs. In flies, especially in reproductive tissues, you will have a lot of piRNAs, which are longer right? Perhaps these are piRNA clusters. Did you try adjusting the dicermin and dicermax settings?

The metric is robust but really depends on those a-priori assumptions about what RNA sizes are 'valid', from the dicermin and dicermax options.

MikeAxtell avatar Nov 03 '20 14:11 MikeAxtell