eemeter
eemeter copied to clipboard
Constant (average) counterfactual with eemeter daily matrix on certain datasets only.
Hi,
I'm using eemeter for a research project comparing reliability of metered savings from hourly and daily gas consumption data. I've been able to generate varying hourly counterfactuals for a set of publicly available data; however, I'm having trouble generating a varying counterfactual for daily consumption with some datasets. Instead of giving me a counterfactual that varies with temperature, I'm getting the average of the baseline meter period.
Have you ever come across an issue like this? Is this just a data issue, or is there an issue with the model?
Output from Dataset 1 (LCL-June2015v2_126)
Summary statistics from baseline period included for reference.
value
count 385.000000
mean 1.341932
std 0.924915
min 0.000000
25% 0.708000
50% 1.299000
75% 1.908000
max 7.219000
reporting_observed counterfactual_usage
2013-01-21 00:00:00+00:00 1.560 1.642287
2013-01-22 00:00:00+00:00 3.207 1.628237
2013-01-23 00:00:00+00:00 1.796 1.598183
2013-01-24 00:00:00+00:00 2.400 1.610889
2013-01-25 00:00:00+00:00 1.746 1.610034
2013-01-26 00:00:00+00:00 2.336 1.497270
2013-01-27 00:00:00+00:00 2.314 1.408208
2013-01-28 00:00:00+00:00 1.914 1.459275
2013-01-29 00:00:00+00:00 1.635 1.304730
2013-01-30 00:00:00+00:00 0.000 1.352743
Output from Dataset 2 (LCL-June2015v2_0)
Summary statistics from baseline period included for reference.
value
count 385.000000
mean 5.912592
std 2.664848
min 0.000000
25% 5.144000
50% 6.102000
75% 6.922000
max 23.399000
reporting_observed counterfactual_usage
2013-01-21 00:00:00+00:00 6.083 5.912592
2013-01-22 00:00:00+00:00 5.715 5.912592
2013-01-23 00:00:00+00:00 6.080 5.912592
2013-01-24 00:00:00+00:00 6.491 5.912592
2013-01-25 00:00:00+00:00 4.954 5.912592
2013-01-26 00:00:00+00:00 8.271 5.912592
2013-01-27 00:00:00+00:00 6.022 5.912592
2013-01-28 00:00:00+00:00 5.305 5.912592
2013-01-29 00:00:00+00:00 4.802 5.912592
2013-01-30 00:00:00+00:00 0.000 5.912592
Hi @jfenna - What you're seeing here is the "base load only" model being selected. Baseload-only is just that - a flat line with a constant counterfactual (which may not be exactly the average, since it's a line of best fit with a single parameter). For gas, the eemeter by default tries out the heating-only and baseload-only models and takes the one with the best fit (by highest r-squared).
Hi @philngo - thanks for clarifying so quickly, this is really helpful. If I wanted to compare the heating-only and baseload-only models, are you able to suggest a straightforward way to do so?
J
A new daily model has been released. It has many bug and feature improvements unfortunately fitting these models individually is not something that is done anymore because we use a lasso regression inspired penalization to automatically select the best of these 4 models.