ores icon indicating copy to clipboard operation
ores copied to clipboard

Historical model variants

Open halfak opened this issue 8 years ago • 0 comments

I turns out that different modeling strategies produce different ranges of scoring probabilities and other differences in scorer model behavior. In order to not surprise users with such changes, we should allow users to choose to continue to use an old model even after a new one is deployed.

For example, the following URL gets a score for the "primary" model:

/scores/enwiki/damaging/123456789

This could be done explicitly with a variant param.

/scores/enwiki/damaging/123456789?variant=gradient_boosting

We'd need to change the output for when model info is requested so that there can be multiple variants reported.

/scores/enwiki/damaging/ returns:

{
  "linear_svc_balanced": { ...model_info..., "primary": false},
  "gradient_boosting": { ...model_info..., "primary": true}
}

This would also change the way we think about caching scores.  Right now, we a score is stored and retrieved based on a key "<context>:<model>:<version>:<rev_id>".  We'd need to add "variant" to that.  "<context>:<model>:<variant>:<version>:<rev_id>".  This then begs the question -- when we say "model", do we really mean *model*?  We're now generalizing the concept of a "model" to a "modeling problem" -- e.g. "predict when an edit is damaging". 

Under this scheme, we could still make updates to the models by adding new sources of signal and making backwards incompatible changes to `revscoring`, but the overall behavior of each variant should stay relatively consistent. 

halfak avatar Mar 09 '16 20:03 halfak