Enhance usage config to include score bounds/type
Is your feature request related to a problem? Please describe
When describing model outcomes, we expect a threshold to come from the user, but its not always clear what type of outcome we have (0-100 or 0.00 to 1.00 for percentiles versus probabilities)
Describe the solution you'd like
For clarity, output configuration should be more than just a column name, but should include expected valid values that Seismometer can filter based on, or provide as limits. Similar to how we use a custom class for the cohort columns.
For example:
primary_output: Primary Model Score
outputs:
- source: ScoreName1
display_name: Primary Model Score
type: numeric
lower_bound: 0
upper_bound: 1
- source: ScoreName2
display_name: A Second Score
type: numeric
lower_bound: 0
upper_bound: 100
- source: ScoreName3
display_name: A Third Score for completeness
type: numeric
lower_bound: -12
upper_bound: 144
Where type could potentially be extended to allow categorical scores in the future.
Describe alternatives you've considered
Additional context
@gbowlin Are you thinking of this as more of a transformation or validation?
For transformation I think there'd be at least two strategies needed: Do we clip to the range vs clear those outside the range?
Validation seems more like the drop value option, but then the question is drop the value or drop the entire row.
And of course inclusive vs exclusive vs mixed?
For bounds I would assume we follow the numpy standard of include lower bound exclude upper bound, and if we need to we can come back with an option to select how you want bounds to work.
For clip vs clear (set to nan), this should probably an option with clip being the default.
May break this out into a separate issue, but want to track that annotations can be improved alongside discretization of score bounds.
We currently have a couple methods in lines to add labels to our plots, with mostly baked in assumptions around being 0-1. The mostly has caused around scaling.
Once we handle scaling more consistently closer to data-loading, functions/plots can have stronger opinions - the annotations would likely assume percentile (and label with percentage).