emhass
emhass copied to clipboard
Feature request: add option to automatically adjust PV forecast using historical actual/forecast data
I have tree shading/shadowing issues where the PV forecast can significantly overestimate PV production for certain times of the day especially during different seasons. To illustrate here is a chart showing the mean PV for each 30 minute block in December and June of last year (southern hemisphere summer vs winter). Due to northern tree shading, my PV production in winter is minimal until after midday.
Tree shading impacts are very consistent so I suggest adding an option to turn on the publishing of an adjusted/dampened PV forecast sensor where this adjustment factor is dynamically calculated, and distinct for every 30 minute block in a 24 hour period, using history (fetched from HA) that compares forecast PV with actual PV production. There are a few ways in which this can be done. My rudimentary suggestion is:
- for every 30 minute interval over the last N days, calculate a ratio of forecast PV to mean actual PV in that 30 minute block, to result in a set of N adjustment factors for every 30 minute block in a 24 period (N * 48).
- from that set of N adjustment factors, for every 30 minute block in a 24 period, calculate the mean of those adjustment factors to determine a single adjustment factor to use for every 30 minute block in a 24 period (48 adjustment factor values)
- when returning an adjusted PV forecast for each 30 minute block, lookup the corresponding adjustment factor for that 30 minute period (from the set of 48) and apply it to the normal PV forecast value
- recalculate a new set of adjustment factors each night
- publish both the normal and adjusted PV forecasts in HA (as 2 sensors) because a history of the unadjusted normal PV forecast values will be needed to calculate future adjustment factors
- potentially publish the adjustment factors in HA too in case people might want to re-use them for other calculations
If anyone knows of better, more reliable, ways to calculate appropriate adjustment factors then please comment.
The goal is to have the adjusted forecast PV gradually reflect the site-specific shading impacts relatively quickly (within N days) and gradually and automatically adjust to seasonal changes. This way there is no need for people who have seasonal shading issues to manually calculate any dampening factors themselves and add HA template sensors and automations to apply them.
The adjustment factor calculations only have to be done once a day, so should have a minimal performance impact.
I think N can default to 10 as I understand that is the default for HA providing detailed sensor history. It would be good to have an option to change this for those who reconfigure HA to keep more days of sensor history.
I have very recently started using this method outside of emhass using a hacked together set of SQL scripts and python code to apply adjustment factors to a solcast forecast. However, I think it would be better to do this inside of emhass. I can try and have a go at implementing it myself and submitting a PR (but may need some guidance from time to time).
What are your thoughts?
I think that: Let's do it! I will open a PR that structures this implementation. As discussed on discord I'm prone to go directly with the machine learning method. But it will be really nice to have a first base reference option such as what you describe. So I will open the PR soon (this week hopefully) and then you can contribute your current method to that PR if you agree. Your list of suggestions is a first recipe to implement this if I got this right?
Thanks David, that sounds great! I'm not particularly attached to the method I suggested above, so if you do want to go directly to a machine learning model that would be absolutely fine. I only suggested the method above as one I am capable of implementing with my limited analytical skills. If you want me to try it out as a baseline comparison for the machine learning model then I'm happy to do that. I first need to convert my SQL queries into Python querying the history data extracted using the HA REST API. I can do that using my existing code outside of emhass and then attempt to bring it in when I know it's working - that way I'm not trying get that code working at the same time as trying to learn how to do it in emhass - if I know it's working then I can just focus on getting it working in emhass.
Working on this here #476
Hi. Advanced quite a bit on this. The method seems to be working fine. I need to do more inspection and analysis but at first glance it seems fine. On my testing data I don't have shading problems so it would be nice to test it on your setup @paulhomes so see if it catch fine those issues. I've added the solar position as a feature with the hope to be able to make the difference between shading and just a cloudy day. The approach is currently using plain and basic machine learning with a lasso regression. I will try to finish packing a working solution on #476 so that you can pull the image and test. And yes it will be interesting to compare with your current working approach outside EMHASS. To compare the methods you will need to provide a metric for evaluation, it can be RMSE or R2 or any other. Simple visual comparison is not enough IMO
Thanks David. I'm very keen to try this out. The tree shading impacts on my panels are not too bad at the moment as we are just heading out of summer. I only have tree shading until about 8am currently. It will start to impact me much more in about 1-2 months time.
In terms of available testing data, I currently have HA in its default config only keeping 10 days of detailed data in sqlite but I plan on switching over to postgres soon and keeping much more ongoing. Outside of HA I have daily 5 min interval PV data in CSV files going back at least 2 years but unfortunately do not have the associated forecast data.
I'll have a look into calculating RMSE and R2.
I've also been wondering about possible iterative effects of curtailment on forecast adjustment. During daylight hours on sunny days we often have negative feed in prices and so when/if it gets to a point in the afternoon when the battery is full, I can no longer consuming all the available PV, and the feed-in price is negative, my HA automations configure the inverter to limit exports. This of course results in artificially lower PV values later in the afternoon and may appear to be like (artificial) afternoon shading from the time curtailment is enabled. The difference, of course, is that PV is potentially available. I am wondering if it could end up in a gradual loop where the forecast PV is adjusted down in afternoon (lower than it needs to be because of prior days curtailment) causing more grid power to be imported to charge the battery to a higher SoC earlier in the day, resulting in curtailment gradually creeping forward earlier in the day over several days/weeks. My brain is going round in circles thinking about this.
Yes curtailment will also be something the algorithm needs to catch. I think it works with the current setup but this needs indeed some analysis to check these cases.
And yes you will need more history data to make this work, including the history of the previous forecasts. I use the base sql db from home assistant but I've my recorder to store a custom list of sensors and states. No problem with more than 1 year of data.
I have done some initial testing with this new feature in version 0.13 and run into an error with what looks like hard-coded sensors names in forecast.py.
In my config.json I have the following:
"set_use_adjusted_pv": true,
"sensor_power_photovoltaics": "sensor.total_dc_power",
"sensor_power_photovoltaics_forecast": "sensor.test_p_pv_forecast",
... and do have a sensor.test_p_pv_forecast from a test docker container that has been publishing with that prefix for a week.
When running a day ahead optimization I get the following stack trace and KeyError looking for 'sensor.power_photovoltaics':
emhass-test | [2025-04-03 13:33:00 +1000] [24] [INFO] Adjusting PV forecast, retrieving history data for model fit
emhass-test | [2025-04-03 13:33:00 +1000] [24] [INFO] Retrieve hass get data method initiated...
emhass-test | [2025-04-03 13:33:04 +1000] [24] [ERROR] Exception on /action/dayahead-optim [POST]
emhass-test | Traceback (most recent call last):
emhass-test | File "/app/.venv/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 3805, in get_loc
emhass-test | return self._engine.get_loc(casted_key)
emhass-test | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
emhass-test | File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc
emhass-test | File "index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc
emhass-test | File "pandas/_libs/hashtable_class_helper.pxi", line 7081, in pandas._libs.hashtable.PyObjectHashTable.get_item
emhass-test | File "pandas/_libs/hashtable_class_helper.pxi", line 7089, in pandas._libs.hashtable.PyObjectHashTable.get_item
emhass-test | KeyError: 'sensor.power_photovoltaics'
emhass-test |
emhass-test | The above exception was the direct cause of the following exception:
emhass-test |
emhass-test | Traceback (most recent call last):
emhass-test | File "/app/.venv/lib/python3.12/site-packages/flask/app.py", line 1511, in wsgi_app
emhass-test | response = self.full_dispatch_request()
emhass-test | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
emhass-test | File "/app/.venv/lib/python3.12/site-packages/flask/app.py", line 919, in full_dispatch_request
emhass-test | rv = self.handle_user_exception(e)
emhass-test | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
emhass-test | File "/app/.venv/lib/python3.12/site-packages/flask/app.py", line 917, in full_dispatch_request
emhass-test | rv = self.dispatch_request()
emhass-test | ^^^^^^^^^^^^^^^^^^^^^^^
emhass-test | File "/app/.venv/lib/python3.12/site-packages/flask/app.py", line 902, in dispatch_request
emhass-test | return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return]
emhass-test | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
emhass-test | File "/app/src/emhass/web_server.py", line 414, in action_call
emhass-test | input_data_dict = set_input_data_dict(
emhass-test | ^^^^^^^^^^^^^^^^^^^^
emhass-test | File "/app/src/emhass/command_line.py", line 271, in set_input_data_dict
emhass-test | P_PV_forecast = adjust_pv_forecast(
emhass-test | ^^^^^^^^^^^^^^^^^^^
emhass-test | File "/app/src/emhass/command_line.py", line 129, in adjust_pv_forecast
emhass-test | fcst.adjust_pv_forecast_data_prep(df_input_data)
emhass-test | File "/app/src/emhass/forecast.py", line 756, in adjust_pv_forecast_data_prep
emhass-test | P_PV = data["sensor.power_photovoltaics"] # Actual PV production
emhass-test | ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
emhass-test | File "/app/.venv/lib/python3.12/site-packages/pandas/core/frame.py", line 4102, in __getitem__
emhass-test | indexer = self.columns.get_loc(key)
emhass-test | ^^^^^^^^^^^^^^^^^^^^^^^^^
emhass-test | File "/app/.venv/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 3812, in get_loc
emhass-test | raise KeyError(key) from err
emhass-test | KeyError: 'sensor.power_photovoltaics'
I can see lines 756 and 757 in forecast.py are as follows:
P_PV = data["sensor.power_photovoltaics"] # Actual PV production
P_PV_forecast = data["sensor.p_pv_forecast"] # Forecasted PV production
It looks like they need to be replaced with the names of sensors specified in config.json
I added a PR #499 to fix this. It appears to work for me now with those changes.
With that PR in place, I now get the following when running a day-ahead optimization:
emhass-test | [2025-04-03 13:39:15 +1000] [23] [INFO] Adjusting PV forecast, retrieving history data for model fit
emhass-test | [2025-04-03 13:39:15 +1000] [23] [INFO] Retrieve hass get data method initiated...
emhass-test | /app/.venv/lib/python3.12/site-packages/sklearn/linear_model/_coordinate_descent.py:695: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 4.065e+05, tolerance: 4.572e+04
emhass-test | model = cd_fast.enet_coordinate_descent(
emhass-test | [2025-04-03 13:39:19 +1000] [23] [INFO] PV adjust Training metrics: RMSE = 620.9592899664684, R2 = 0.8430236275991677
emhass-test | [2025-04-03 13:39:19 +1000] [23] [INFO] Retrieving data from hass for load forecast using method = naive
emhass-test | [2025-04-03 13:39:19 +1000] [23] [INFO] Retrieve hass get data method initiated...
emhass-test | [2025-04-03 13:39:20 +1000] [23] [INFO] >> Performing dayahead optimization...
emhass-test | [2025-04-03 13:39:20 +1000] [23] [INFO] Performing day-ahead forecast optimization
emhass-test | [2025-04-03 13:39:20 +1000] [23] [INFO] Perform optimization for the day-ahead
emhass-test | [2025-04-03 13:39:21 +1000] [23] [INFO] Status: Optimal
emhass-test | [2025-04-03 13:39:21 +1000] [23] [INFO] Total value of the Cost function = 0.83
emhass-test | [2025-04-03 13:39:21 +1000] [23] [INFO] >> Sending rendered template table data
Is that ConvergenceWarning significant?
A minute later, from an every 5min scheduled MPC, I get the following which has a SettingWithCopyWarning:
emhass-test | [2025-04-03 13:40:32 +1000] [23] [INFO] Adjusting PV forecast, retrieving history data for model fit
emhass-test | [2025-04-03 13:40:32 +1000] [23] [INFO] Retrieve hass get data method initiated...
emhass-test | /app/.venv/lib/python3.12/site-packages/sklearn/linear_model/_coordinate_descent.py:695: ConvergenceWarning:
emhass-test |
emhass-test | Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 4.065e+05, tolerance: 4.572e+04
emhass-test |
emhass-test | [2025-04-03 13:40:37 +1000] [23] [INFO] PV adjust Training metrics: RMSE = 620.7325127006322, R2 = 0.8431535874701463
emhass-test | [2025-04-03 13:40:37 +1000] [23] [INFO] Retrieving data from hass for load forecast using method = naive
emhass-test | [2025-04-03 13:40:37 +1000] [23] [INFO] Retrieve hass get data method initiated...
emhass-test | [2025-04-03 13:40:38 +1000] [23] [INFO] >> Performing naive MPC optimization...
emhass-test | [2025-04-03 13:40:38 +1000] [23] [INFO] Performing naive MPC optimization
emhass-test | /app/src/emhass/forecast.py:1460: SettingWithCopyWarning:
emhass-test |
emhass-test |
emhass-test | A value is trying to be set on a copy of a slice from a DataFrame.
emhass-test | Try using .loc[row_indexer,col_indexer] = value instead
emhass-test |
emhass-test | See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
emhass-test |
emhass-test | /app/src/emhass/forecast.py:1532: SettingWithCopyWarning:
emhass-test |
emhass-test |
emhass-test | A value is trying to be set on a copy of a slice from a DataFrame.
emhass-test | Try using .loc[row_indexer,col_indexer] = value instead
emhass-test |
emhass-test | See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
emhass-test |
emhass-test | [2025-04-03 13:40:38 +1000] [23] [INFO] Perform an iteration of a naive MPC controller
emhass-test | [2025-04-03 13:40:38 +1000] [23] [INFO] Status: Optimal
emhass-test | [2025-04-03 13:40:38 +1000] [23] [INFO] Total value of the Cost function = 1.18
emhass-test | [2025-04-03 13:40:38 +1000] [23] [INFO] >> Obtaining params:
emhass-test | [2025-04-03 13:40:38 +1000] [23] [INFO] Passed runtime parameters: {'publish_prefix': 'test_'}
emhass-test | [2025-04-03 13:40:38 +1000] [23] [INFO] >> Setting input data dict
emhass-test | [2025-04-03 13:40:38 +1000] [23] [INFO] Setting up needed data
emhass-test | [2025-04-03 13:40:38 +1000] [23] [INFO] >> Publishing data...
emhass-test | [2025-04-03 13:40:38 +1000] [23] [INFO] Publishing data to HASS instance
emhass-test | [2025-04-03 13:40:38 +1000] [23] [WARNING] No saved entity json files in path:/data/entities
emhass-test | [2025-04-03 13:40:38 +1000] [23] [WARNING] Falling back to opt_res_latest
emhass-test | [2025-04-03 13:40:38 +1000] [23] [INFO] Successfully posted to sensor.test_p_pv_forecast = 3502.58
emhass-test | [2025-04-03 13:40:38 +1000] [23] [INFO] Successfully posted to sensor.test_p_load_forecast = 919.0
emhass-test | [2025-04-03 13:40:38 +1000] [23] [INFO] Successfully posted to sensor.test_p_pv_curtailment = 0.0
emhass-test | [2025-04-03 13:40:38 +1000] [23] [INFO] Successfully posted to sensor.test_p_hybrid_inverter = 3502.58
emhass-test | [2025-04-03 13:40:38 +1000] [23] [INFO] Successfully posted to sensor.test_p_batt_forecast = 0.0
emhass-test | [2025-04-03 13:40:38 +1000] [23] [INFO] Successfully posted to sensor.test_soc_batt_forecast = 67.0
emhass-test | [2025-04-03 13:40:38 +1000] [23] [INFO] Successfully posted to sensor.test_p_grid_forecast = -2583.58
emhass-test | [2025-04-03 13:40:38 +1000] [23] [INFO] Successfully posted to sensor.test_total_cost_fun_value = 3.27
emhass-test | [2025-04-03 13:40:38 +1000] [23] [INFO] Successfully posted to sensor.test_optim_status = Optimal
emhass-test | [2025-04-03 13:40:38 +1000] [23] [INFO] Successfully posted to sensor.test_unit_load_cost = 0.2
emhass-test | [2025-04-03 13:40:38 +1000] [23] [INFO] Successfully posted to sensor.test_unit_prod_price = 0.1
Is that ConvergenceWarning significant?
It could be just one training iteration that did not converge, also we are doing a grid search to find the best parameters, so it might just be an evaluation on that search that did not went well. The best will be to inspect the results but your r2 seems reasonable good enough
I'll close this as complete. We can discuss the ongoing tests on the discord or in the discussion section.