Whippet.jl icon indicating copy to clipboard operation
Whippet.jl copied to clipboard

Question regarding relative proportion of TE events human transcriptome

Open JamalEH opened this issue 4 years ago • 10 comments

Dear Team,

I run Whippet on my paired-end, unstranded, polyA+ RNA-seq dataset. The dataset is of 75bp length (all reads in all samples have this length) and performed in triplicates per condition (2 conditions in total one is a negaitve control and the other one is a silencing of a lncRNA with LNA).

Whippet run options : default options, except -r 10 -s2: I tell Whippet to consider events whose total number of supporting reads are at least 10, in at least 2 samples of each condition.

Using a filtering criteria of : Probability > 0.95 and |dPSI| > 0.1 (10%), I observed the most abundant differential splicing events in my dataset are TE events with a ratio of 654/785 events in total.

Should this be normal? or is it the nature of the RNA-seq dataset (polyA+) which causes this higher number of TE events? I mean, since it is a human transcriptome, one would expect a higher number of CE events and only few tandem poly-adenylation sites events.

How is it possible to check whether the reported TE events are true-positives, as a starting point I inspected the events using IGV but since it is not involving a junction it was difficult to see differences between the two compared conditions.

I would be very thankful if you can comment on this!

Thank you so much in advance! Kind regards, Jamal.

JamalEH avatar Feb 17 '20 13:02 JamalEH

Hi Jamal, Did you solve your problem? I recently ran into the same situation. What's your explanation? Thanks.

itszhengan avatar Mar 19 '20 18:03 itszhengan

Hi Zheng,

I did not solve the issue, and unfortunately I got no feedback from the authors so I just stopped using whippet due to this lack of documentation and support.

To answer the second part of your question, in my case I would expect to have some changes at 3'utr length and polyadenylation site usage, as my experiment is targeting a splicing factor that has been described to regulate 3'utr length, but my concern was that almost 90% of significant events are TE and I don't know whether it was due to some bias towards this type of events.

To make the story short, my advice to you is to take into consideration the following: 1, your experiment: do you expect to alteration of 3'utr length and polyA site usage. 2, sometimes the annotation used has an effect on the output: try to change the annotation (ensembl vs refseq for example) then compare. 3. you could also try other tools capable of reporting those events.

Kind regards, Jamal.

JamalEH avatar Mar 19 '20 18:03 JamalEH

Dear Zheng,

Sorry for the late reply!

I think if I understood the concept behind splicing analysis of Whippet program is that the tool reports an exon cassette event as nodes, where a node is a single exon together with the splicing sites surrounding it. You can get the potential exon being differentially spliced from the exon file that was generated at the step of creating your index. There you can re-map the node id/number with the potential exon involved. Maybe the author could comment better on this.

I hope it helps, PS: did you finally clarify why you have higher number of TE events in your dataset?

Kind regards, Jamal.

JamalEH avatar Mar 25 '20 09:03 JamalEH

Thank you. I recently read the preprint of Whipplet and I think the definition of node is explained better than the final paper, where TE and TS are regarded as UTR, which solves my another question about the node.

Zheng An Administrative Assistant China-Japan Union Hospital of Jilin University

JamalEH [email protected] 于2020年3月25日周三 下午5:53写道:

Dear Zheng,

Sorry for the late reply!

I think if I understood the concept behind splicing analysis of Whippet program is that the tool reports an exon cassette event as nodes, where a node is a single exon together with the splicing sites surrounding it. You can get the potential exon being differentially spliced from the exon file that was generated at the step of creating your index. There you can re-map the node id/number with the potential exon involved. Maybe the author could comment better on this.

I hope it helps, PS: did you finally clarify why you have higher number of TE events in your dataset?

Kind regards, Jamal.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/timbitz/Whippet.jl/issues/96#issuecomment-603746489, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALY6KVFFWMEVJ23SHAHEX23RJHIATANCNFSM4KWRZHKQ .

itszhengan avatar Mar 25 '20 16:03 itszhengan

Dear Zheng,

Thak you for your reply! So basically you would consider the TE events, the predominant events in your dataset, as true events?

Kind regards, Jamal.

JamalEH avatar Mar 27 '20 09:03 JamalEH

Dear Zheng,

I have observed in the dPSI output file that some significant events have length of 1 nucleotide (end_coordinates - start coordinates of the significant node). Did you observe the same thing in your dataset? Are you considering as true events or you will ignore them?

Thank you! Best regards, Jamal.

JamalEH avatar Mar 31 '20 11:03 JamalEH

Hi Jamal,

Sorry for the late reply.

Did you observe the same thing in your dataset?

Yes I also noticed these significant events. I think considering these as true events is up to your filter criteria. As for me, I'll filter them out before further analysis. But if you want to classify alternative splicing events, TE and TS should be removed because I didn't see any corresponding event types of TE and TS on https://github.com/timbitz/Whippet.jl (There are only six AS events types based on node types).

Best, Zheng

itszhengan avatar Apr 01 '20 14:04 itszhengan

Hi Zheng, Thank you for your reply!

Those events are still the most abundant and significant in my dataset even if increasing the filtering creteria (Posterior P > 0.95 and |dPSI| >0.2.

If you look at this picture showing the events, TE "also TS" are quantified based on the read coverage over each node in the compared conditions, that is why in the figure you do not see edges representing them. The author says that TE corresponds to alternative polyadenylation sites usage within the 3'utr frame of the genes. In my case the genes showing this TE events are so important and explains the observed cellular phenotype upon my experiment.

What do you think?

Best regards, Jamal.

JamalEH avatar Apr 01 '20 14:04 JamalEH

Hi Jamal,

I believe what you discovered is called "Microexon" ( I'm not familiar with this so please tell me if I'm wrong). Here are the links: https://www.biostars.org/p/261324/

Best, Zheng

itszhengan avatar Apr 02 '20 02:04 itszhengan

Hi Zheng,

Thank you so much for you reply!

Ideed, you are absolutely right, those very short events are maybe cryptic or microexons, but I'm not sure they could also be present within the 3'UTR, if the TE, as stated in the documentation of the tool are alternative polyadenylation sites within the 3'UTR frame. I hope to hear a comment on this from the authors.

Kind regards, Jamal.

JamalEH avatar Apr 02 '20 09:04 JamalEH