rGREAT icon indicating copy to clipboard operation
rGREAT copied to clipboard

Statistics with explicitly defined background

Open OrangeyO2 opened this issue 1 year ago • 0 comments

Hello

Thank you for developing/maintaining the package and for the detailed documentation.

I have a question regarding the appropriate statistics to use when running rGREAT with explicitly defined background.

According to the documentation,

Unlike the similarly-named hypergeometric test over genes used with a whole genome background, it is not biased by the differing sizes of the regulatory domains of genes.

Later in the documentation, we see that binomial p-value is preferred over hypergeometric because of the biases in hypergeometric test, but I suppose this no longer holds when we run with a defined background?

By default, GREAT ranks results by the binomial p-value, and we consider this the single best way to examine genome-wide cis-regulatory datasets. It accounts for biases in gene regulatory domain size and provides an accurate picture of the cis-regulatory landscape. Still, it is important to examine other statistics too:

I am a little confused because in the vignette, towards the bottom:

In fact, the native hypergeometric method in GREAT can be approximated to the binomial method here. Nevertheless, the binomial method is more general and it has no restriction as the hypergeometric method where input regions must be perfect subsets of backgrounds.

When we run rGREAT with explicitly defined background Q1) is p_adjust binomial or the modified hypergeomtric? Q2) which is recommended, p_adjust or p_adjust_hyper?

Thank you!

OrangeyO2 avatar Jan 09 '24 18:01 OrangeyO2