dat-science
dat-science copied to clipboard
Replaced by https://github.com/github/scientist
Science is happening elsewhere!
This repository is historical. Up-to-date bits are over in github/scientist.
A Ruby library for carefully refactoring critical paths. Science isn't a feature flipper or an A/B testing tool, it's a pattern that helps measure and validate large code changes without altering behavior.
How do I do science?
Let's pretend you're changing the way you handle permissions in a large web app. Tests can help guide your refactoring, but you really want to compare the current and new behaviors live, under load.
require "dat/science"
class MyApp::Widget
def allows?(user)
experiment = Dat::Science::Experiment.new "widget-permissions" do |e|
e.control { model.check_user(user).valid? } # old way
e.candidate { user.can? :read, model } # new way
end
experiment.run
end
end
Wrap a control block around the code's original behavior, and wrap candidate
around the new behavior. experiment.run will always return whatever the
control block returns, but it does a bunch of stuff behind the scenes:
- Decides whether or not to run
candidate, - Runs
candidatebeforecontrol50% of the time, - Measures the duration of both behaviors,
- Compares the results of both behaviors,
- Swallows any exceptions raised by the candidate behavior, and
- Publishes all this information for tracking and reporting.
If you'd like a bit less verbosity, the Dat::Science#science helper
instantiates an experiment and calls run:
require "dat/science"
class MyApp::Widget
include Dat::Science
def allows?(user)
science "widget-permissions" do |e|
e.control { model.check_user(user).valid? } # old way
e.candidate { user.can? :read, model } # new way
end
end
end
Making science useful
The examples above will run, but they're not particularly helpful. The
candidate block runs every time, and none of the results get
published. Let's fix that by creating an app-specific sublass of
Dat::Science::Experiment. This makes it easy to add custom behavior
for enabling/disabling/throttling experiments and publishing results.
require "dat/science"
module MyApp
class Experiment < Dat::Science::Experiment
def enabled?
# See "Ramping up experiments" below.
end
def publish(name, payload)
# See "Publishing results" below.
end
end
end
After creating a subclass, tell Dat::Science to instantiate it any time the
science helper is called:
Dat::Science.experiment = MyApp::Experiment
Controlling comparison
By default the results of the candidate and control blocks are compared
with ==. Use comparator to do something more fancy:
science "loose-comparison" do |e|
e.control { "vmg" }
e.candidate { "VMG" }
e.comparator { |a, b| a.downcase == b.downcase }
end
Ramping up experiments
By default the candidate block of an experiment will run 100% of the time.
This is often a really bad idea when testing live. Experiment#enabled? can be
overridden to run all candidates, say, 10% of the time:
def enabled?
rand(100) < 10
end
Or, even better, use a feature flag library like Flipper. Delegating the decision makes it easy to define different rules for each experiment, and can help keep all your entropy concerns in one place.
def enabled?
MyApp.flipper[name].enabled?
end
Publishing results
By default the results of an experiment are discarded. This isn't very useful.
Experiment#publish can be overridden to publish results via any
instrumentation mechanism, which makes it easy to graph durations or
matches/mismatches and store results. The only two events published by an
experiment are :match when the result of the control and candidate behaviors
are the same, and :mismatch when they aren't.
def publish(event, payload)
MyApp.instrument "science.#{event}", payload
end
The published payload is a Symbol-keyed Hash:
{
:experiment => "widget-permissions",
:first => :control,
:timestamp => <a-Time-instance>,
:candidate => {
:duration => 2.5,
:exception => nil,
:value => 42
},
:control => {
:duration => 25.0,
:exception => nil,
:value => 24
}
}
:experiment is the name of the experiment. :first is either :candidate or
:control, depending on which block was run first during the experiment.
:timestamp is the Time when the experiment started.
The :candidate and :control Hashes have the same keys:
:durationis the execution in ms, expressed as a float.:exceptionis a reference to any raised exception ornil.:valueis the result of the block.
Adding context
It's often useful to add more information to your results, and
Experiment#context makes it easy:
science "widget-permissions" do |e|
e.context :user => user
e.control { model.check_user(user).valid? } # old way
e.candidate { user.can? :read, model } # new way
end
context takes a Symbol-keyed Hash of additional information to publish and
merges it with the default payload.
Keeping it clean
Sometimes the things you're comparing can be huge, and there's no good way
to do science against something simpler. Use a cleaner to publish a
simple version of a big nasty object graph:
science "huge-results" do |e|
e.control { OldAndBusted.huge_results_for query }
e.candidate { NewHotness.huge_results_for query }
e.cleaner { |result| result.count }
end
The results of the control and candidate blocks will be run through the
cleaner. You could get the same behavior by calling count in the blocks,
but the cleaner makes it easier to keep things in sync. The original
control result is still returned.
What do I do with all these results?
Once you've started an experiment and published some results, you'll want to
analyze the mismatches from your experiment. Check out
dat-analysis where you'll find an
analysis toolkit to help you understand your experiment results.
Hacking on science
Be on a Unixy box. Make sure a modern Bundler is available. script/test runs
the unit tests. All development dependencies will be installed automatically if
they're not available. Dat science happens primarily on Ruby 1.9.3 and 1.8.7,
but science should be universal.
Maintainers
@jbarnette and @rick