triage
triage copied to clipboard
Metric calculation is bogus
precision calculation is currently taking predictions for several as of dates, and calculating precision across all of them together, resulting in bogus results. need to look at how to do it for each as of date separately and then aggregate or something more reasonable.
Not actionable as written. Closing, can reopen with more details if needed.
Given the following temporal configuration:
temporal_config:
feature_start_time: '2010-01-04'
feature_end_time: '2019-01-01'
label_start_time: '2015-02-01'
label_end_time: '2019-01-01'
model_update_frequency: '1y'
training_label_timespans: ['1month']
training_as_of_date_frequencies: '1month'
test_durations: '1y'
test_label_timespans: ['1month']
test_as_of_date_frequencies: '1month'
Resulting in the following temporal configuration:
As you can see, we will realize 12 different predictions in the test using the train model.
Should we get 12 different metric calculations? An array? Just the total one?
My feeling on this is that there should be a different set of parameters in your temporal config, test_frequency
and test_interval
or somesuch that determines how many and which test matrices your model is evaluated on, and the test_duration
and test_example_frequency
are for how many and which dates to perform a single evaluation on (whether combining all of the dates in the way currently done makes sense is, I think, debatable). When we initially wrote the test_duration
and test_example_frequency
keys, we were thinking of cases where test predictions are also event-based, so each date may be sparsely labeled and combining multiple dates is necessary.
I feel like there are already issues to this effect somewhere.
Ah, yes, I said the same thing in #378. Doesn't make me right, just consistent. :)
Another thought on this: We are doing evaluations the same way (making one evaluation over all dates) in both test and train. For EWS problems, presumably, this method is equally bogus in both train and test. Should there be a flag to control this behavior?