drop
drop copied to clipboard
Help with interpretation (Does the call look genuine?)
Hello,
I've used DROP on a single sample (Used ~30 normal samples from SRA as reference) and while analyzing results from Aberrant Splicing > Pipeline > Fraser > Results Table. There are several Theta type events, all have width of 2. From what I understood, Theta refers to partial or full intron retention / exon elongation / exon truncation, so would it be fair to guess that intron retention is present in the sample? Please see below the screenshot of RNA data (Theta event highlighted in Red).
Its values are as follows - pValue = 4.78e-7 pAdjust = 0.019 zScore = 4.41
From what I understood from Fraser documentation the values used for threshold indicated were only for the test dataset. Are there any cut-offs which would be a good place to start? Are these values good enough to be considered significant?
The number of reads in IGV don't seem reassuring to me. RNA-seqc tells the gene has mean coverage of 10.18, is it too low to be considered for analysis?
Hi, thanks for using FRASER through DROP. Great that you were able to set up your analysis using control samples. Regarding the cutoffs we suggest the following:
- padust = 0.05
- deltapsi = 0.3 We don't really use z score for splicing as psi is a more appropriate metric. FRASER filters out junctions with low counts. In this case, your gene of interest does seem to have low coverage, but I'd say not too low. Note that a stop codon might be found in the retained intron, thus activating NMD. In that case, the allele will be reduced and maybe not detected as a splicing outlier. We suggest complementing your splicing analysis by also performing the expression analysis. If so, the recommended cutoff is simply padjust = 0.05.
Hello,
I've summarized what I've understood so far, please let me know if its correct -
- padjust for aberrant splicing events should ideally be below 0.05
- deltapsi for aberrant splicing events should be below 0.3
- In case of NMD, there may be fewer reads containing the aberrant event, hence might not be detected as splicing outlier, so using additional data (for ex. more GTEx samples) as well as comparing the results from aberrant expression
- In case the aberrant expression is also present for the splicing event, it warrants further investigation.
- However if there are no corroborating expression events, we can a) check other splicing events in other genes or b) investigate the event further but I'm not sure what else can be done
And a bit more about the analysis
- It is best to use control samples which were sequenced in the same lab with same age group / population. If not available, we could use counts from GTEx / other experiments however this may influence the results and some events may not be picked up
- In the OUTRIDER heatmap plot, even after Autoencoder there is some clustering visible in the heatmap. This too can affect the end results
Sorry for the delayed response. Point by point
- Ideally the padj values reflect your acceptable false discovery rate (0.05 being a classic threshold)
- deltapsi refers to the change in splicing usage. as a result values greater than 0.3 or less than -0.3 refer to 30% changes in either direction.
- NMD would be more evident in the
aberrantExpression
module that depending on the mechanism and location of the variants could be seen inaberrantSplicing
if it affected the junction usage - both modules offer evidence and can be investigated independently or in conjunction
- you can also use the
rnaVariantCalling
or investigate other VCF files that may help you identify rare variants responsible for the splicing aberration. - DROP isn't a control vs case kind of analysis each sample is compared to the group to see if it behaves differently. The goal is that these differences aren't attributed to confounding variables. FRASER and OUTRIDER use auto-encoders to reduce this noise.
- I don't see the heatmap you're referring to, but it just means that there is some dependency inherent in the data.