caltrack icon indicating copy to clipboard operation
caltrack copied to clipboard

CalTRACK Issue: Statistical Measures for CalTRACK Hourly Method

Open c3-sarthakgupta opened this issue 10 months ago • 6 comments

Problem statement

Hi, I am implementing the CalTRACK Hourly Method for some M&V use cases. I am referring to the documentation here: https://docs.caltrack.org/en/latest/methods.html.

I have some questions regarding implementing the statistic measures in Section 4.3 for the Hourly Model. The two measures I am focussing on are CV(RMSE) and FSU, defined as the following in the documentation: image and image

I have the following questions regarding calculating these for the Hourly Model:

  1. The documentation provides values for the empirical coefficients for the "billing" and "daily" models as image

What should be the values used for the hourly method?

  1. The values for "P" and "c" can be different over the 12 monthly models under the Hourly method. When reporting the CV (RMSE) and FSU over a multi-month reporting period, does CalTRACK recommend a way to aggregate these values?
  2. Lastly, I wanted to confirm that the total no of periods "P" for monthly model would correspond to the total number of hours in the month.

c3-sarthakgupta avatar Apr 05 '24 17:04 c3-sarthakgupta

  1. In the Sun and Baltazar ASHRAE conference paper they only determine these improved equations for monthly and daily interval data, so the best that can be done is to use the constant 1.26 in place of the polynomial.
  2. FSU is a normalized uncertainty metric so you could multiply it by the monthly savings, resulting in the uncertainty of each model for the month. These could then be added together in quadrature and then divide by the total savings for the year to get the FSU for the year.
  3. In the case of the billing/monthly model, P would the number months.

travis-recurve avatar Apr 16 '24 16:04 travis-recurve

Hi Travis, Thank you so much for answering the questions. For question 3, I think I might have been unclear. What would be the value of P for the hourly method?

c3-sarthakgupta avatar Apr 18 '24 14:04 c3-sarthakgupta

P is the number of data points in the baseline period. 8760 minus however many hours you are missing. P' is the effective number of data points taking into consideration a lag 1 autocorrelation. Q is the number of data points in the reporting period.

travis-recurve avatar Apr 18 '24 16:04 travis-recurve

Once again, thanks Travis. That answers more things for me. One question remains though - the hourly method requires training 12 models, one for every month of the year. Now each of these models will have its own set of explanatory parameters, which will vary in values and number of parameters. Consequently, c (no of parameters) will be different over different months.

And since each model is trained over different datasets ( 3 calendar months), P and P' will be different over the 12 models as well (not 8760 I suppose).

Hence, when getting FSU values for a certain month in the reporting period, should we be using the c, P, and correspondingly t, P', corresponding to the model of that month, right? Then we can use the strategy you recommended to get an FSU over multiple months

FSU is a normalized uncertainty metric so you could multiply it by the monthly savings, resulting in the uncertainty of each model for the month. These could then be added together in quadrature and then divide by the total savings for the year to get the FSU for the year.

c3-sarthakgupta avatar Apr 22 '24 20:04 c3-sarthakgupta

I also realize that the degrees of freedom defined by (P-c-1) will be greater than 100 across all the 12 monthly models. So we could potentially use the same t=1.65, but CVRMSE will still change from month to month.

c3-sarthakgupta avatar Apr 22 '24 20:04 c3-sarthakgupta

You are correct. It would be 3 calendar months, but 2 of those months are weighted at 50% so effectively it's 2 months. I'm not totally sure what the proper way to handle this is to be honest. It might be easier to just leave it at the 1 month being modeled at a time and call it good enough. If you didn't then how do you deal with P' when you're only predicting on 1 month? I guess you could include the same months that it was built on, but that seems strange to me.

travis-recurve avatar Apr 24 '24 18:04 travis-recurve