dREG icon indicating copy to clipboard operation
dREG copied to clipboard

How to interpret the probability and max score in the output files?

Open MasterChief1O7 opened this issue 2 years ago • 2 comments

Hi, sorry if I am not supposed to ask such question here, please let me know if there is a designated email or group.

Can you please elaborate on how should I interpret the probability and score given by dREG to each peak? For example, I used it on GROseq data from mice cell line, and there were many peaks which have probability as 0.0 or very low (~10E-15) but dREG score such as 0.3 or 0.4, does that mean something?

Also is there a way to identify enhancers from all peaks? Or should I just consider every detected peak outside a gene as an enhancer?

Thanks

MasterChief1O7 avatar Aug 09 '21 12:08 MasterChief1O7

Dear Ankit,

No worries - this is fine!

Can you please elaborate on how should I interpret the probability and

score given by dREG to each peak? For example, I used it on GROseq data from mice cell line, and there were many peaks which have probability as 0.0 or very low (~10E-15) but dREG score such as 0.3 or 0.4, does that mean something?

dREG scores are the raw output of the SVR. Values near 1 represent a region that looks very much like a TIR; values near 0 represent a region that looks very much like either a gene body or an intergenic region where there is no evidence of transcription initiation. Values near 0.3 and 0.4 are also very likely TIRs.

The p-values (what you call probability) represents the probability of observing a dREG score >0 if the region is actully either a gene body or an intergenic region. These can be used a lot like you would use the p-value or (when corrected by FDR) the q-values provided by any other peak caller.

Also is there a way to identify enhancers from all peaks? Or should I just consider every detected peak outside a gene as an enhancer?

We usually define candidate enhancers as dREG peaks that are located distally from an annotated transcription start site. How far from a TSS they have to be is arbitrary - lots of papers have used values >10kb (though this probably treats some proximal enhancers as promoters).

Best, Charles

Thanks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/dREG/issues/14, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYUH7MRQ3IRIMSJ4OMV6KTT3675ZANCNFSM5BZ35IQA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

-- Robert N. Noyce Assistant Professor in Life Science & Technology Baker Institute for Animal Health College of Veterinary Medicine Cornell University 235 Hungerford Hill Road Ithaca, NY 14853

Phone: 315-395-4693 Website: http://www.dankolab.org E-mail: @.***

dankoc avatar Aug 09 '21 12:08 dankoc

Dear Charles,

Thank you for the great explanation. So should I consider that the score and p-value are, kind of, inversely related, also it seems like that. Also while manually going through the peaks positions with respect to GROseq data, it seems that 0.8 threshold for dREG score is a good value to filter out the false or somewhat weak peaks, does that sound reasonable?

On another note, I also noticed that while plotting the distribution of dREG score for peaks which have p-value/probability = 0.0 have a clear bimodal distribution, there are peaks either below 0.75 (approx.) or above it (like in the image attached), and it was consistent in 2 replicates of both of the samples I tried. But shouldn't peaks with p-value = 0 have a very high score?

Regards Ankit score_pval_0

MasterChief1O7 avatar Aug 10 '21 14:08 MasterChief1O7