eemeter
eemeter copied to clipboard
Advices on how to use eemeter
Hello,
I'm interested in applying the eemeter library on data for building winter energy usage, but I'm having some difficulties in applying the model for daily data, while I'm getting consistent results in the monthly and weekly cases.
Data
Data are for a building in Italy for years 2019, 2020 and 2021 (data.zip). The meter data report the number of instants in which the heating machine was on. Below you can find a plot of the data.
Weekly and monthly models
Aggregating the data to obtain monthly and weekly frequencies, the resulting models make sense. This is the code that I'm using to generate the model.
import datetime
import pytz
import pandas as pd
import matplotlib.pyplot as plt
import eemeter
# Load data
meter_data_path = "./meter_data.csv"
temp_data_path = "./temperature_data.csv"
meter_data = pd.read_csv(meter_data_path, index_col=0)
meter_data.index = pd.to_datetime(meter_data.index)
temp_data = pd.read_csv(temp_data_path, index_col=0)
temp_data.index = pd.to_datetime(temp_data.index)
temp_data = temp_data.resample("1H").mean().interpolate(method="linear").value
temp_data = temp_data.loc[temp_data.index >= datetime.datetime(2019, 1, 1, tzinfo=pytz.utc)]
# Define parameters: "W" for weekly model, "M" fr monthly model
time_freq = "W"
use_billing_presets = True
weights_col = "num_days"
# Aggregate meter data
meter_data_agg = meter_data.value.dropna().resample(time_freq).agg(["sum", "size"])
meter_data_agg["num_days"] = meter_data_agg["size"] / 24
meter_data_agg = meter_data_agg.rename(columns={"sum": "value"})
# Create caltrack billing design matrix and extract baseline data
data = eemeter.create_caltrack_billing_design_matrix(meter_data_agg, temp_data)
baseline_data = eemeter.get_baseline_data(
data,
start=datetime.datetime(2019, 1, 1, tzinfo=pytz.utc),
end=datetime.datetime(2019, 12, 31, tzinfo=pytz.utc),
max_days=None
)
# Add weights column to baseline data
baseline_df = baseline_data[0]
baseline_df[weights_col] = meter_data_agg[weights_col]
# Fit Caltrack model
model_results = eemeter.fit_caltrack_usage_per_day_model(
baseline_data[0],
use_billing_presets=use_billing_presets,
weights_col=weights_col
)
# Plot resulting model
fig, ax = plt.subplots(2, 1, figsize=(12, 8))
ax[0].set_title("Reference period")
eemeter.plot_energy_signature(
meter_data_agg.loc[meter_data_agg.index <= datetime.datetime(2020, 1, 1, tzinfo=pytz.utc)],
temp_data, ax=ax[0])
model_results.plot(ax=ax[0], with_candidates=False)
ax[1].set_title("Whole dataset")
eemeter.plot_energy_signature(meter_data_agg, temp_data, ax=ax[1])
model_results.plot(ax=ax[1], with_candidates=False)
fig.subplots_adjust(hspace=0.5)
plt.show()
The above code generates the following two figures (setting time_freq
to "W" and "M" respectively).
Weekly
Monthly
Daily data
The daily data show a strong dependence on the day of the week with a very different pattern between weekdays and weekends (see image below).
Consequently, when I fit the Caltrack daily model, I obtain a model that underestimate the in-week values and overestimate the weekends.
My idea was to include a week of day categorical variable in the regression model features (overriding the methods get_single_*_only_candidate_model
). Do you have any advice on how to improve the daily model?
Thank you!
Hi @stvilla, this is a really interesting issue. The daily method you're using is the reference implementation for the CalTRACK Daily/Billing methods, and so the fix will need to be made in the CalTRACK methods. Others have found similar discrepancies between the daily and billing models so I'm sure if you are able to find a fix it would be of interest. I don't have a ready answer to this and would love some help. My suggestion is that if, after some experimentation, you do find a workaround that we merge something into the OpenEEmeter to deal with it - perhaps a flag that turns on and off the new behavior (please make a PR) and then also take it to the CalTRACK methods working group as a possible improvement.
Hi @philngo, my team and I have not yet found a solution that could be directly integrated into the OpenEEmeter, but we have found a solution that can maybe be generalized. As I mentioned, we have two clusters in the data corresponding to the different usage of the building (namely, weekends and working days).
Therefore we have thought to build a different baseline model m1
and m2
for each cluster and to compose the two resulting models in a single one M
. In the next image, you can see the fit of the two models.
The next image shows the result on the time series data used as reference period: the blue dashed line shows the model for the working days (that overestimate the weekends), and the orange dashed one corresponds to the model for the weekends. The orange continuous line is the result obtained with the composed model.
Finally, below I attach a code snippet used to generate the model. To obtain the desired result, we use the weights_col
param of the method create_caltrack_daily_design_matrix
, setting to zero the weights of the two clusters in turn.
# WEEKDAY
baseline_design_matrix = eemeter.create_caltrack_daily_design_matrix(
baseline_meter_data, temperature_data,
)
baseline_design_matrix["W"] = np.where(weekday_baseline_design_matrix.index.dayofweek >= 5, 0, 1)
weekday_baseline_model = eemeter.fit_caltrack_usage_per_day_model(
baseline_design_matrix,
weights_col = "W"
)
# WEEKEND
weekend_baseline_design_matrix = eemeter.create_caltrack_daily_design_matrix(
baseline_meter_data, fah_temp_data,
)
weekend_baseline_design_matrix["W"] = np.where(weekend_baseline_design_matrix.index.dayofweek >= 5, 1, 0)
weekend_baseline_model = eemeter.fit_caltrack_usage_per_day_model(
weekend_baseline_design_matrix,
weights_col = 'W'
)
I will make a PR if we find a solution that could be directly integrated inside the library.
Thank you!
A net daily model has been released that should resolve any weekday/weekend modeling discrepancies.