ores
ores copied to clipboard
Historical model variants
I turns out that different modeling strategies produce different ranges of scoring probabilities and other differences in scorer model behavior. In order to not surprise users with such changes, we should allow users to choose to continue to use an old model even after a new one is deployed.
For example, the following URL gets a score for the "primary" model:
/scores/enwiki/damaging/123456789
This could be done explicitly with a variant
param.
/scores/enwiki/damaging/123456789?variant=gradient_boosting
We'd need to change the output for when model info is requested so that there can be multiple variants reported.
/scores/enwiki/damaging/
returns:
{
"linear_svc_balanced": { ...model_info..., "primary": false},
"gradient_boosting": { ...model_info..., "primary": true}
}
This would also change the way we think about caching scores. Right now, we a score is stored and retrieved based on a key "<context>:<model>:<version>:<rev_id>". We'd need to add "variant" to that. "<context>:<model>:<variant>:<version>:<rev_id>". This then begs the question -- when we say "model", do we really mean *model*? We're now generalizing the concept of a "model" to a "modeling problem" -- e.g. "predict when an edit is damaging".
Under this scheme, we could still make updates to the models by adding new sources of signal and making backwards incompatible changes to `revscoring`, but the overall behavior of each variant should stay relatively consistent.