eemeter
eemeter copied to clipboard
Segmentation with holidays
Energy usage in buildings typically varies on holidays compared to weekends or other weekday-hour brackets. The segmentation allows us to easily define a new map, like the following example, and segregate holiday data from the rest. This enhances the regression accuracy through more precise occupancy bins. However, one challenge is the number of data points in the holiday segment, which is necessary to prevent overfitting due to the number of independent variables (such as 168 weekday-hours, temperature bins, etc.). I would recommend to update segment_weights... and segment_time_series functions of segmentation.py to include holidays.
"three_month_weighted": { "jan": "dec-jan-feb-weighted", "feb": "jan-feb-mar-weighted", "mar": "feb-mar-apr-weighted", "apr": "mar-apr-may-weighted", "may": "apr-may-jun-weighted", "jun": "may-jun-jul-weighted", "jul": "jun-jul-aug-weighted", "aug": "jul-aug-sep-weighted", "sep": "aug-sep-oct-weighted", "oct": "sep-oct-nov-weighted", "nov": "oct-nov-dec-weighted", "dec": "nov-dec-jan-weighted", "holiday": "holiday", },
The new hourly model does not work quite like this.
The general idea of using holidays is good, but it's also complicated because every country and even regions within countries have their own unique holidays. There are python packages to help with this and we are considering adding into the new hourly model.