openhab-core
openhab-core copied to clipboard
Possible inconsistency on persistence Layer / possible Item State inconcistency
Hello guys,
@Jlaur, @florian-h05 , @joerg1985, @mherwege, @J-N-K , @holgerfriedrich : I add you to this issue as I see you have worked recently on persistence layer, and on new timeseries functionality.
@lolodomo, @clinique, @lsiepel : I add you as review of linky binding.
@weymann : also add you as you was first to respond my question on this issue !
I’m currently working on bindings (Linky) that use timeseries, and have notice some strange behavior.
I’m currently not sure if I’ve done something wrong, or if there is really some inconsistency in timeseries layer, so I would appreciate your expertise on the good way to have thing done, and how everything should work.
To limit the scope, I would only expose a subset of the binding.
Let say I’ve got a month consumption collection, giving the consumption for (M-x, … M-2, M-1, MCurrent). This collection come from my service provider. Each value is timestamped, currently at the beginning of month, 00:00:00 time. The value is a Quantitype(Double, Units.KILOWATT_HOUR).
What I would like of course is to plot the series using graph, and expose the current item State value for MCurrent.
Data is refreshing each night, so MCurrent will be update with a different value each morning.
Also, persistent layer is InfluxDB, but I don’t thing it’s really relevant, I observer similar behavior with InMemory persistence.
That is the context.
I’ve got a updateTimeSeries function that is quite simple:
Variant 1: Just send the time series using SendTimeSeries.
TimeSeries timeSeries = new TimeSeries(Policy.REPLACE);
Foreach (value, valueTimestamp : values) {
timeseries.add(timestamp, new QuantityType(value, Units.K)
}
sendTimeSeries(channelId, timeseries);
Variant 2: add an updateState call after the sendTimeSeries.
TimeSeries timeSeries = new TimeSeries(Policy.REPLACE);
Foreach (value, valueTimestamp : values) {
timeseries.add(timestamp, new QuantityType(value, Units.K)
}
sendTimeSeries(channelId, timeseries);
updateState(channelId, new QuantityType(valueCollection[values[size-1].value, Units.K)
Variant 3: same as variant 2, but don’t put value for current month on sendTimeSeries.
Let’s go to test case!
1/ Scenerio 1
In first scenario, I use the variant 1 of updateTimeSeries. What I observe is that timeseries is correctly initialized. In influxDB, I see following values:
Items Timestamp Value
----------- ------------------------------ ---------
Conso_Month 2024-12-31T23:00:00.000000000Z 2387.121
Conso_Month 2025-01-31T23:00:00.000000000Z 2666.551
Conso_Month 2025-02-28T23:00:00.000000000Z 1680.295
Where 1680.295 is the current state. But if I display the item current state, the current state is set to Null.
Ok. Now let restart Openhab, but don’t do any modification at all on the timeseries. If I display the item current state, the current state Is now 1680,295.
This is because persistence layer restores current state from last state in the timeseries table. This is done in function:
org.openhab.core.persistence.internal.PersistenceManagerImpl.restoreItemStateIfPossible()
if (UnDefType.NULL.equals(state)) {
state = persistedItem.getState();
Ok, let’s admit. But this is strange that the state is not the same before and after restarting Openhab! Don’t we have to initialize the state from the sendTimeSeries if state is currently null?
If I ask value from the rest service: http://localhost:8080/rest/persistence/items/Conso_Month?serviceId=influxdb&starttime=2024-09-15T12%3A33%3A19.935Z&endtime=2025-05-31T13%3A33%3A19.934Z&boundary=false&itemState=true
I’ve got following answer:
{"name": "Linky_Melody_Monthly_Conso_Month","datapoints": "3","data": [
{"time": 1735686000000,"state": "2387.121"},
{"time": 1738364400000,"state": "2666.551"},
{"time": 1740783600000,"state": "1680.295"}]}
This is inline with what we have in influxDb. But notice that itemState=true in Url have not any effect. You can set it to false as well as true, that do not chage anything as state is currently Null before restart.
After restarting, the result change to:
{"name": "Linky_Melody_Monthly_Conso_Month","datapoints": "3","data": [
{"time": 1735686000000,"state": "2387.121"},
{"time": 1738364400000,"state": "2666.551"},
{"time": 1740783600000,"state": "1680.295"},
{"time": 1742568539279,"state": "1680.295"}
]}
There is an extra value at the end, with time is current timestamp. The value is coming from the current state of the item, and the timestamp change each time you refresh the page. This value is only displayed when itemState = true in the URL.
This value is coming from the change of Florian in commit SHA-1: 389f6a34343a723442829b7d9841240adc2f2a9c / [rest] Persistence: Optionally add current Item state to response (#4394) in the following function: org.openhab.core.io.rest.core.internal.persistence.PersistenceResource.createDTO()
What I notice in this code is that we add the currentState to the dto collection without checking if this state is already present as the last value before current time of the collection.
2/ Scenario 2: In second scenario, I use the variant 2 of updateTimeSeries. What I observe is that timeseries is correctly initialized, but I’ve got an extra line for last Month. In influxDB, I see following values:
Items Timestamp Value
----------- ------------------------------ ---------
Conso_Month 2024-12-31T23:00:00.000000000Z 2387.121
Conso_Month 2025-01-31T23:00:00.000000000Z 2666.551
Conso_Month 2025-02-28T23:00:00.000000000Z 1680.295
Conso_Month 2025-03-21T14:39:00.000000000Z 1680.295
There is two lines for the currentState 1680.295:
- One coming from the sendTimeSeries that is value timestamp at start of month
- One coming from the updateState, timestamp at the current date/time.
If I display the current state of Conso_Month, I’ve got a correctly initialized value of 1680.295, that was not the case in Scenerio 1. Now if I call the rest service, I’ve got the following result, even with itemState==false
{"name": "Linky_Melody_Monthly_Conso_Month","datapoints": "3","data": [
{"time": 1735686000000,"state": "2387.121"},
{"time": 1738364400000,"state": "2666.551"},
{"time": 1740783600000,"state": "1680.295"},
{"time": 1742568539279,"state": "1680.295"}
]}
Because of the extraline in InfluxDB, and that we don’t check in createDto if current state is already present in the collection.
And if ever we put itemState=true in the url parameters, we have now the following result:
{"name": "Linky_Melody_Monthly_Conso_Month","datapoints": "3","data": [
{"time": 1735686000000,"state": "2387.121"},
{"time": 1738364400000,"state": "2666.551"},
{"time": 1740783600000,"state": "1680.295"},
{"time": 1742568539279,"state": "1680.295"},
{"time": 1742569464414,"state": "1680.295"}
]}
The first line coming from sendTimeSeries. The second line coming from updateState. And the third line coming from the createDTO() method !
Because of this of course, if we plot it as a graph, we have duplicates value for current month even if itemState=false on graph.
This time, in scenario 2, if I restart Openhab, the result is the same before / after restarting the server.
3/ Scenario 3: In third scenario, I use the variant 3 of updateTimeSeries. What I observe is that timeseries is correctly initialized, but the last line timestamp is align with current date/time, and not with beginning of month because it was set during the updateState call.
Items Timestamp Value
----------- ------------------------------ ---------
Conso_Month 2024-12-31T23:00:00.000000000Z 2387.121
Conso_Month 2025-01-31T23:00:00.000000000Z 2666.551
Conso_Month 2025-03-21T15:44:57.395000000Z 1680.295
If I call the rest service,
{"name": "Linky_Melody_Monthly_Conso_Month","datapoints": "3","data": [
{"time": 1735686000000,"state": "2387.121"},
{"time": 1738364400000,"state": "2666.551"},
{"time": 1742571897395,"state": "1680.295"},
{"time": 1742572007345,"state": "1680.295"}
]}
We have still a duplicate value for the currentState because createDTO adding it. In case of restart, the result is the same before / after restarting the server.
As you can see, the behavior can be quite different in regards the way your update your values, calling updateState or not. I can start / propose a pull request to try to align the comportment to have something more deterministic. But before I would like to have your advice.
1/ Do you see something wrong in my analysis.
2/ When update time series, what are the correct way to have thing done. a. Calling Sendtimeseries only. b. Calling Sendtimeseries + updateChannel c. Should work ever you use solution a or solution b!
3/ When we call Sendtimeseries, shouldn't we initialize automatically the current state of the item if it’s not already initialized (state=Null) ?
4/ In the call PersistenceResource.createDTO(), when we add the currentState, shouldn’t we check in dto collection if last data of the collection equals to the currentState, and so not readding it one more time?
5/ Do you think that there are unit tests that already covered this sort of thing? I begin to read them, but don’t find so far.
6/ Do you have any other remarks that would be useful?
Maybe also include @jlaur in this. I believe he worked a lot on the original TimeSeries implementation.
TimeSeries are a recent addition, and they are not designed to directly impact the current state of an item. If you set TimeSeries, it will just update values in persistence, not the current state. And the question is if it should. You can very well set TimeSeries values in the past and have a current state that is different, or have more recent values already in persistence.
The standard way to put something in persistence is to update the item state, and that will trigger writing to persistence with the current timestamp. The only mechanism that updates the state from persistence is restoreOnStartup. Writing a time series to persistence will not.
There is one other case where state will be set: future persisted values from time series in the future will set the current state as the current time moves past the future persisted value.
I think the above perfectly explains all you see.
Next question is what to do. It shouldn’t hurt the same value is in persistence multiple times. So I think the approach of not just loading the time series, but also setting the state (and therefore writing an extra record in persistence) does not harm at all.
Hello Mark,
Thanks for this first answer. I don't know how I miss Jacob on this, I've add it to the list :)
I make a first try to solve my issue on value returned from the rest service. You can see my proposal on this pullrequest : https://github.com/openhab/openhab-core/pull/4666. Test seems concluant, but I'm awating for your comment on it.
Laurent.
Honestly, I still don’t see what exactly you want to achieve. Historic states in the persistence store should be static and not influence the current state. Future values can change (a forecast) and would impact current state as time moves forward, but become static when they get in the past. Why do you want to overwrite the past and have it impact the current state?
Mark,
Let me re-explain, my first description was perhaps over complicated. I've got the following value from my webservice : January : 2387.121 kwh, February : 2666.551 kwh, March : 1736,903 kwh.
What I want (expected result) is simply:
- Display a consumption graph, give the following resuts:
- Display the current item state value show the march consumption : 1736,903 kwh as we are in March.
What I get today with the current code is:
- If I call only sendTimeSeries with my value : graph is ok, but currentState is Null until I restart openhab.
- If I call sendTimeSeries + updateState : - Graph is ko, I've got an extra bar event if itemState = false.
- currentState is ok.
Hope this is clearer like this.
And more then that,
- I don't see why from a programmer perspective, there should be a difference between:
- Calling only sendTimeSeries(timeseries).
- Calling sendTimeSeries(timeseries) + updateState(state)
When last value (before current time) of sendTimeSeries == state. This difference from my mind is a open window to inconsistent result and strange behaviour.
- I don't undestand the difference in treatment from this two options.
- If i call only sendTimeSeries, I've finally have currentState after rebooting initialize, but value is the one from begin of month, not current timestamp.
- If I call updateState, this add a new line value to database with current timestamp. And so I've got duplicate value in the rest service even if itemState = false.
Laurent.
I see a few issues with what you want to do. Ultimately your problem is a graphing problem, not a persistence problem.
First, anything you store in OH persistence is a state at a given point in time. A monthly consumption is not really a state. You would want to persist a consumption value which is a continuous value (always increasing over time for consumption). Think of it as a meter reading. Your monthly values would then be deltas between begin and end of the month. You could graph your meter reading in a line graph. Consumption by month could be a bar chart. I am not very familiar with the graphing in mainUI and I don’t know if you can do that, but Grafana definitely could.
Second, I actually think you only have the real consumption at the end of the period. So it is a bit awkward to see it at the start.
What about using a separate item (channel) for the consumption in the current month? You can use time series to store as you do now to get the graph you want in one channel. Don’t restore this item at startup, so it never gets a state. Your graphs will be what you expect. And keep another channel (item) with the current month consumption. You set this with the current month consumption, but you don’t load a time series in this one. You can persist this and restore at startup.
When last value (before current time) of sendTimeSeries == state. This difference from my mind is a open window to inconsistent result and strange behaviour.
I think automatically updating state when loading a time series is actually more dangerous. How do you know the last value in the time series is the state? What if there already is a later value in persistence? What if the state is being updated at the same time through the normal events? E.g. think of a once per hour persistence strategy. The state of the item is already different from the last persisted state. You load a time series replacing the last persisted state. Should the state of the item now update as well?
I think your requirement comes from the fact you want to graph something, but what you want to graph is not actually a state in the sense of a OH state, that is dynamic and changes on events. Your data should not be updated by events at all. If that is what you need, just do that, don’t restore on startup, and don’t try to have an active state, just the persisted values.
First, anything you store in OH persistence is a state at a given point in time. A monthly consumption is not really a state. You would want to persist a consumption value which is a continuous value (always increasing over time for consumption). Think of it as a meter reading. Your monthly values would then be deltas between begin and end of the month. You could graph your meter reading in a line graph. Consumption by month could be a bar chart. I am not very familiar with the graphing in mainUI and I don’t know if you can do that, but Grafana definitely could.
You said a monthly consumption is not really a state ? I'm sorry, but I don't undestand, your previous sentence said anything store in OH is a state at a given point. I'm more in phase with this first assertion, everything is a state as some point. It can be a temporary state is the value continue to evolve day by day, and so the state will be replace by another state. It can be also a fixed state when value stop to evolve when you move to next month. Yes, we're ok that it is a meter reading. But it's not because it's a metter reading that we not have a state if we see it from a certain point in time.
Second, I actually think you only have the real consumption at the end of the period. So it is a bit awkward to see it at the start.
Yes, I'm ok with this. I've try to move it to end of month instead. But if I do so, I have some issue on my graph because it is plot under the next month. So I see the juanuary consumption for exemple under February axis marker.
What about using a separate item (channel) for the consumption in the current month? You can use time series to store as you do now to get the graph you want in one channel. Don’t restore this item at startup, so it never gets a state. Your graphs will be what you expect. And keep another channel (item) with the current month consumption. You set this with the current month consumption, but you don’t load a time series in this one. You can persist this and restore at startup.
In a user perspective, I would not be very happy to have to look to different graph / different item to see the full history. What I want is each morning to look to a simple graph, and see what is my current consumption for the month in regards to previous month. Even if we are not at end of month, I can infer that for exemple I'm at midlle of the month, but my consumption is already at 70% of the last month consumption, which is bad.
I think automatically updating state when loading a time series is actually more dangerous. How do you know the last value in the time series is the state? What if there already is a later value in persistence? What if the state is being updated at the same time through the normal events? E.g. think of a once per hour persistence strategy. The state of the item is already different from the last persisted state. You load a time series replacing the last persisted state. Should the state of the item now update as well?
But this is already what we done on restarting of openhab. The state is initialized from lastvalue in the persisted data value filtered before the current timestamp. So we should not do it at all in both case, and do it in all the case ? I know because this is the last value in the state data, and that currently items have no state (==null). So no other value in persistence at this time would apply. If state is being update at the same time, it would just override the state with a new value, no ? I don't want to put a new value in the timeseries, just update the state if it not initialized. About per hour strategy, one more time, it will replace only if current state value == null, so no, it would not apply in this case.
I think your requirement comes from the fact you want to graph something, but what you want to graph is not actually a state in the sense of a OH state, that is dynamic and changes on events. Your data should not be updated by events at all. If that is what you need, just do that, don’t restore on startup, and don’t try to have an active state, just the persisted values.
That's disappointing. Why we have a REPLACE on time series if data can evolve in time. I don't see what is different from what is done in other binding.
That's disappointing. Why we have a REPLACE on time series if data can evolve in time. I don't see what is different from what is done in other binding.
So far, I have not seen a single binding that uses time series to update past values. This REPLACE strategy is foremost designed to replace a forecast (future values) with an updated forecast. You can update past values with it for some persistence services (e.g. not for rrd4j), but I think your expectation that that would impact current item state is wrong.
Here is another suggestion: You can actually have everything in one item, just don’t persist the item. In that case, restoreOnStartup will restore the state of the item, but will not create an extra value in persistence. When you load your time series, also explicitly update the state of the item. As it will not be persisted, there won’t be an extra value in persistence. That way you avoid the conflict of writing to persistence from the time series and the state update, and you still have the state in the item.
You said a monthly consumption is not really a state ? I'm sorry, but I don't undestand, your previous sentence said anything store in OH is a state at a given point.
What you do is write some values to a DB to be able to make the graphs work a certain way. That’s why I do not call it a state. It is not written by an item state change. If it were, you should see increasing values within the month as the consumption for the month adds up, and not a single value positioned at the month start. The frequency of that would be controlled by the persistence strategy, but the pattern would be the same. So, that’s why I don’t call what you have in persistence a value reflecting the state at that point in time. And if that is what you want from persistence, that is fine. But don’t generalize this, as it totally moves away from the state at a point in time persistence idea.