HiveSwarm icon indicating copy to clipboard operation
HiveSwarm copied to clipboard

Is ZTest incorrectly calculating mean and σ for Normal Distribution?

Open vegerot opened this issue 1 year ago • 0 comments

In https://github.com/livingsocial/HiveSwarm/blob/8345130e3da37555c0ca36da538bfd4f4ee3c834/src/main/java/com/livingsocial/hive/udf/ZTest.java#L23, the normal distribution we use has mean=0 and σ=1.

public class ZTest extends UDF {

	private static NormalDistribution distribution = new NormalDistributionImpl(); // mean=0,  σ=1

    public double pval(double val){
    	try {
			return   2 * (1 - distribution.cumulativeProbability(val));

However, shouldn't we instead be using mean=controlAvg and σ=controlStddev?

I'm a stats n00b so I'm probably missing something obvious. Does the criticalValue calculation normalize the data such that the mean and σ are standard?

vegerot avatar Jul 18 '23 20:07 vegerot