Whippet.jl icon indicating copy to clipboard operation
Whippet.jl copied to clipboard

Feature Request: Variable boundaries like SUPPA

Open Alex-Nesta opened this issue 4 years ago • 3 comments

I've seen a few issues posted here that describe lots of alternative first and alternative last events. These could simply be due to a noisy reference gtf. SUPPA makes it easy to limit this issue by defining a variable boundary region for first and last exons. Can this feature be added to whippet?

Alex-Nesta avatar Apr 03 '20 16:04 Alex-Nesta

Hi Alex, I experienced exactly the same thing in my datasets (polyA+ unstranded RNA-seq datasets), but I never get feedbacks on these subset of abundant and highly significant events. I agree with you what you are proposing here could be an option to get red of these events from Whippet's output files.

JamalEH avatar Apr 12 '20 16:04 JamalEH

@Alex-Nesta Yeah these are due to the reference GTF -- Sure this could be added, but I doubt I'll get to it in this version of Whippet. I think your best bet is to just filter out the events that you think should be biologically meaningful or not.

@JamalEH One possibility for the reason you're getting lots of significant TS/TE events but not splicing events is that you have a low read-depth dataset, and since splicing events for whippet require exon-exon junction spanning reads by default, whereas TE/TS events utilize exon-body reads, the depth required for statistical significance of these two types is fundamentally different. That doesn't mean that TS/TE events are more biologically prevalent in your sample, but perhaps more technically prevalent given your sequencing depth.

timbitz avatar Jun 22 '20 05:06 timbitz

@timbitz Thank you so much for your feedback!

Actually, I observed a similar trend of TS/TE events frequency in a dataset of 150M reads per library, on average. The dataset is paired-end, stranded, with a read length of 76bp. I observed a prevalence of TS/TE events for this dataset as well, and the problem is that, it was very difficult to see those events from IGV genome viewer (with the exception of some easley seen events). I should however, say that for the previous dataset, the TE events I observed almost 55% of which overlapped with the CLIP-seq coordinates showing the binding of a splicing factor that was investigated in my study. This splicing factor is well known to regulate the usage of alternative polyadenylation sites.

I have a related question, why do we observe a very low overlap with other tools, such as SUPPA2, rMATS, PSIsigma, when considering the significance of the dPSI. What could be in your opinion the best way to overlap the results of 2 different tools? This is true for all the splicing event types, including alternative first and last exons. The trend of splicing change (e.g. dPSI) among conditions is similar between the tools, but when filtering those events based on the significance, the overlap (events significant in 2 tools) is extremely low, even if the same gene annotation model and the same read files are analysed!

Thank you and my apologies for the late feedback!

Best regards, Jamal.

JamalEH avatar Aug 25 '20 22:08 JamalEH