HiveSwarm
HiveSwarm copied to clipboard
Is ZTest incorrectly calculating mean and σ for Normal Distribution?
In https://github.com/livingsocial/HiveSwarm/blob/8345130e3da37555c0ca36da538bfd4f4ee3c834/src/main/java/com/livingsocial/hive/udf/ZTest.java#L23, the normal distribution we use has mean=0 and σ=1.
public class ZTest extends UDF {
private static NormalDistribution distribution = new NormalDistributionImpl(); // mean=0, σ=1
public double pval(double val){
try {
return 2 * (1 - distribution.cumulativeProbability(val));
However, shouldn't we instead be using mean=controlAvg
and σ=controlStddev
?
I'm a stats n00b so I'm probably missing something obvious. Does the criticalValue
calculation normalize the data such that the mean and σ are standard?