eemeter icon indicating copy to clipboard operation
eemeter copied to clipboard

Advices on how to use eemeter

Open stvilla opened this issue 3 years ago • 2 comments

Hello,

I'm interested in applying the eemeter library on data for building winter energy usage, but I'm having some difficulties in applying the model for daily data, while I'm getting consistent results in the monthly and weekly cases.

Data

Data are for a building in Italy for years 2019, 2020 and 2021 (data.zip). The meter data report the number of instants in which the heating machine was on. Below you can find a plot of the data.

image

Weekly and monthly models

Aggregating the data to obtain monthly and weekly frequencies, the resulting models make sense. This is the code that I'm using to generate the model.

import datetime

import pytz
import pandas as pd
import matplotlib.pyplot as plt

import eemeter


# Load data
meter_data_path = "./meter_data.csv"
temp_data_path = "./temperature_data.csv"

meter_data = pd.read_csv(meter_data_path, index_col=0)
meter_data.index = pd.to_datetime(meter_data.index)

temp_data = pd.read_csv(temp_data_path, index_col=0)

temp_data.index = pd.to_datetime(temp_data.index)
temp_data = temp_data.resample("1H").mean().interpolate(method="linear").value

temp_data = temp_data.loc[temp_data.index >= datetime.datetime(2019, 1, 1, tzinfo=pytz.utc)]


# Define parameters: "W" for weekly model, "M" fr monthly model
time_freq = "W"
use_billing_presets = True
weights_col = "num_days"

# Aggregate meter data
meter_data_agg = meter_data.value.dropna().resample(time_freq).agg(["sum", "size"]) 
meter_data_agg["num_days"] = meter_data_agg["size"] / 24

meter_data_agg = meter_data_agg.rename(columns={"sum": "value"}) 

# Create caltrack billing design matrix and extract baseline data
data = eemeter.create_caltrack_billing_design_matrix(meter_data_agg, temp_data)
    
baseline_data = eemeter.get_baseline_data(
    data,
    start=datetime.datetime(2019, 1, 1, tzinfo=pytz.utc),
    end=datetime.datetime(2019, 12, 31, tzinfo=pytz.utc),
    max_days=None
)

# Add weights column to baseline data
baseline_df = baseline_data[0]
baseline_df[weights_col] = meter_data_agg[weights_col]

# Fit Caltrack model
model_results = eemeter.fit_caltrack_usage_per_day_model(
    baseline_data[0],
    use_billing_presets=use_billing_presets,
    weights_col=weights_col
)

# Plot resulting model
fig, ax = plt.subplots(2, 1, figsize=(12, 8))

ax[0].set_title("Reference period")
eemeter.plot_energy_signature(
    meter_data_agg.loc[meter_data_agg.index <= datetime.datetime(2020, 1, 1, tzinfo=pytz.utc)],
    temp_data, ax=ax[0])
model_results.plot(ax=ax[0], with_candidates=False)

ax[1].set_title("Whole dataset")
eemeter.plot_energy_signature(meter_data_agg, temp_data, ax=ax[1])
model_results.plot(ax=ax[1], with_candidates=False)

fig.subplots_adjust(hspace=0.5)

plt.show()

The above code generates the following two figures (setting time_freq to "W" and "M" respectively).

Weekly

image

Monthly

image

Daily data

The daily data show a strong dependence on the day of the week with a very different pattern between weekdays and weekends (see image below).

image

Consequently, when I fit the Caltrack daily model, I obtain a model that underestimate the in-week values and overestimate the weekends.

image

image

My idea was to include a week of day categorical variable in the regression model features (overriding the methods get_single_*_only_candidate_model). Do you have any advice on how to improve the daily model?

Thank you!

stvilla avatar Mar 09 '21 15:03 stvilla

Hi @stvilla, this is a really interesting issue. The daily method you're using is the reference implementation for the CalTRACK Daily/Billing methods, and so the fix will need to be made in the CalTRACK methods. Others have found similar discrepancies between the daily and billing models so I'm sure if you are able to find a fix it would be of interest. I don't have a ready answer to this and would love some help. My suggestion is that if, after some experimentation, you do find a workaround that we merge something into the OpenEEmeter to deal with it - perhaps a flag that turns on and off the new behavior (please make a PR) and then also take it to the CalTRACK methods working group as a possible improvement.

philngo avatar Mar 22 '21 15:03 philngo

Hi @philngo, my team and I have not yet found a solution that could be directly integrated into the OpenEEmeter, but we have found a solution that can maybe be generalized. As I mentioned, we have two clusters in the data corresponding to the different usage of the building (namely, weekends and working days).

scatter

Therefore we have thought to build a different baseline model m1 and m2 for each cluster and to compose the two resulting models in a single one M. In the next image, you can see the fit of the two models.

models

The next image shows the result on the time series data used as reference period: the blue dashed line shows the model for the working days (that overestimate the weekends), and the orange dashed one corresponds to the model for the weekends. The orange continuous line is the result obtained with the composed model.

timeseries

Finally, below I attach a code snippet used to generate the model. To obtain the desired result, we use the weights_col param of the method create_caltrack_daily_design_matrix, setting to zero the weights of the two clusters in turn.

# WEEKDAY
baseline_design_matrix = eemeter.create_caltrack_daily_design_matrix(
    baseline_meter_data, temperature_data,
)
baseline_design_matrix["W"] = np.where(weekday_baseline_design_matrix.index.dayofweek >= 5, 0, 1)

weekday_baseline_model = eemeter.fit_caltrack_usage_per_day_model(
    baseline_design_matrix,
    weights_col = "W"
)

# WEEKEND
weekend_baseline_design_matrix = eemeter.create_caltrack_daily_design_matrix(
    baseline_meter_data, fah_temp_data,
)
weekend_baseline_design_matrix["W"] = np.where(weekend_baseline_design_matrix.index.dayofweek >= 5, 1, 0)

weekend_baseline_model = eemeter.fit_caltrack_usage_per_day_model(
    weekend_baseline_design_matrix,
    weights_col = 'W'
)

I will make a PR if we find a solution that could be directly integrated inside the library.

Thank you!

stvilla avatar Apr 07 '21 14:04 stvilla

A net daily model has been released that should resolve any weekday/weekend modeling discrepancies.

travis-recurve avatar Mar 19 '24 19:03 travis-recurve