abanalyzer
abanalyzer copied to clipboard
A/B test analysis library for Ruby - performs Chi-Square tests and G-tests on A/B results - New location for https://github.com/livingsocial/abanalyzer
= ABAnalyzer
{}[https://travis-ci.org/bmuller/abanalyzer]
ABAnalyzer is a Ruby library that will perform testing to determine if there is a statistical difference in categorical data (typically called an A/B Test). By default, it uses a {G-Test for independence}[http://en.wikipedia.org/wiki/G-test], but a {Chi-Square test for independence}[http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test] can also be used.
== Installation Simply run: gem install abanalyzer
== Basic Usage The simplest test (which uses a gtest):
require 'abanalyzer'
values = {} values[:agroup] = { :opened => 100, :unopened => 300 } values[:bgroup] = { :opened => 50, :unopened => 350 }
tester = ABAnalyzer::ABTest.new values
Are the two different? Returns true or false (at 0.05 level of significance)
puts tester.different?
== Multiple Categories You can use the ABAnalyzer module to test for differences in more than two categories. For instance, to test accross three:
require 'abanalyzer'
values = {} values[:agroup] = { :male => 200, :female => 250 } values[:bgroup] = { :male => 150, :female => 300} values[:cgroup] = { :male => 50, :female => 50 }
tester = ABAnalyzer::ABTest.new values
Are the two different? Returns true or false (at 0.05 level of significance)
puts tester.different?
== Tests Available You can get the actual p-value for either a Chi-Square test for independence or a G-Test for independence.
... tester = ABAnalyzer::ABTest.new values puts tester.chisquare_p puts tester.gtest_p
You can additionally get the actual score for either a Chi-Square test for independence or a G-Test for independence.
... tester = ABAnalyzer::ABTest.new values puts tester.chisquare_score puts tester.gtest_score
== Sample Size Calculations Let's say you want to determine how large your sample size needs to be for an A/B test. Let's say your baseline is 10%, and you want to be able to determine if there's at least a 10% relative lift (1% absolute) to 11%. Let's assume you want a power[http://en.wikipedia.org/wiki/Statistical_power] of 0.8 and a {significance level}[http://en.wikipedia.org/wiki/Statistical_significance] of 0.05 (that is, an 80% chance of that you'll succeed in recognizing a difference when there is one, and a 5% chance of a false negative).
... ABAnalyzer.calculate_size(0.1, 0.11, 0.05, 0.8) => 14751
This means that you will need at least 14,751 people in each group sample. You can see this same example with R at {on the 37 signals blog}[http://37signals.com/svn/posts/3004-ab-testing-tech-note-determining-sample-size].
== Confidence Intervals You can also get a {confidence interval}[http://en.wikipedia.org/wiki/Confidence_interval]. Let's say you have the results of a test where there were 711 successes out of 4000 trials. To get a 95% confidence interval of the "true" value of the conversion rate, use:
... ABAnalyzer.confidence_interval(711, 4000, 0.95) => [0.1659025512617185, 0.1895974487382815]
This means (roughly) that if you ran this experiment over and over, 95% of the time the resulting proportion would be between 17% and 19%.
You can also determine what the relative confidence intervals would be. Let's say that your old conversion rate was 13%, and you wanted to know what sort of relative lift you could get.
... ABAnalyzer.relative_confidence_interval(711, 4000, 0.13, 0.95) => [0.27617347124398833, 0.45844191337139606]
This means (roughly) that if you ran this experiment over and over, 95% of the time the resulting proportion would be a relative lift of between 28% and 46%. Go buy yourself a beer!
== Running Tests Testing can be run by using: bundle exec rake