openhab-core icon indicating copy to clipboard operation
openhab-core copied to clipboard

OH5 Snapshots - Rules reporting error

Open psmedley opened this issue 6 months ago • 2 comments

Expected Behavior

Rule runs to completion as it does with OH4.3.5

Current Behavior

After updating to OH5 M2 or latest snapshots, I get the following: 2025-05-15 18:53:44.111 [ERROR] [e.automation.internal.RuleEngineImpl] - Failed to execute rule 'af4c760780': Failed to execute action: 7(Cannot invoke "com.google.inject.Injector.getInstance(java.lang.Class)" because the return value of "org.openhab.core.model.script.ScriptStandaloneSetup.getInjector()" is null) Action: 7 contains: `actions:

  • inputs: {} id: "7" configuration: type: application/vnd.openhab.dsl.rule script: > var Number requiredSolar = ((TeslaPowerwall_InstanteousHomePower.state as Number)+(1 - (TeslaPowerwall_BatterySOE.state as Number))20000)/5000100;

    if (requiredSolar > 100) requiredSolar = 100;
    
    if (EVChargingGroup.state == ON) requiredSolar = 100;
    
    if (NegativeFIT.state == OFF) requiredSolar = 100;
    
    if (NegativeGrid.state == ON) requiredSolar = 1;
    
    logInfo("curtailSolar", "required solar percent = {}, Current Inverter
    limit = {}", requiredSolar.intValue,
    SolarEdge__Modbus_Data__Active_Power_Limit__61441_Value_as_Number.state);
    
    
    if
    ((SolarEdge__Modbus_Data__Active_Power_Limit__61441_Value_as_Number.state
    as Number) != requiredSolar.intValue) {
        SolarEdge__Modbus_Data__Advanced_Power_Enable__61762_Value_as_Number.sendCommand(1);
        if ((TeslaPowerwall_BatterySOE.state as Number) > 0.97) {
          logInfo("curtailSolar", "current PW2 charge ({}) > 97%, curtail output",(TeslaPowerwall_BatterySOE.state));
          SolarEdge__Modbus_Data__Active_Power_Limit__61441_Value_as_Number.sendCommand(requiredSolar.intValue);
          DaikinAirbaseACUnit_Zone4.sendCommand(ON);
        } else {
          logInfo("curtailSolar", "current PW2 charge ({}) < 97%, set full rate production",(TeslaPowerwall_BatterySOE.state));
          SolarEdge__Modbus_Data__Active_Power_Limit__61441_Value_as_Number.sendCommand(100);
        }
        SolarEdge__Modbus_Data__Commit_Power_Control__61696_Value_as_Number.sendCommand(1);
    } else {
        logInfo("curtailSolar", "required solar matches current inverter limit, no change required.");
    }
    

    type: script.ScriptAction ` From what I've seen - any rules with scriptactions with rulesdsl are impacted.

Possible Solution

Rollback to 4.3.5

Steps to Reproduce (for Bugs)

See above

Context

Trying to test OH 5.0

Your Environment

  • OH 5.0 snapshot build 4644

psmedley avatar May 15 '25 09:05 psmedley

This issue has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/oh5-rule-errors/164121/1

openhab-bot avatar May 16 '25 23:05 openhab-bot

After cleaning the cache and tmp dir with a restart the problem disappeared for me.

lsiepel avatar May 29 '25 16:05 lsiepel

This issue has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/openhab-5-0-milestone-discussion/162686/236

openhab-bot avatar Jul 09 '25 05:07 openhab-bot

After cleaning the cache and tmp dir with a restart the problem disappeared for me.

Someone on the community forum said that it does not help.

@florian-h05 and @Nadahar : you are probably the only ones who made changes regarding rules engines in OH5. Any idea what could be wrong suddenly?

lolodomo avatar Jul 09 '25 17:07 lolodomo

I don't find a lot of changes...

@holgerfriedrich : do you think it could be related to dependencies upgrades?

lolodomo avatar Jul 09 '25 17:07 lolodomo

Can we reproduce the issue ?

lsiepel avatar Jul 09 '25 17:07 lsiepel

I have no problem myself with my DSL rules but it seems to be related to DSL scripts and I never used such scripts.

lolodomo avatar Jul 09 '25 18:07 lolodomo

My rules PR (#4633) is still unmerged. The only thing I can think of that I've done is some minor changes related to regeneration of template based rules in #4718. None of that really touches the code that performs Actions, so I can't quite imagine how that code could be related to the problem at hand, and #4718 was merged May 26th, after this issue was created.

Looking at the stack traces, no bells ring either. But, there have been done some things related to Action in 5.0, some Actions are now available from MainUI. Could there be something in those changes?

Apart from that, it kind of smells a bit of dependency trouble, since the error is related to Guice, which deals with dependency injection.

Nadahar avatar Jul 09 '25 21:07 Nadahar

@lolodomo I am not familiar with the rule engine. I cannot tell if this could be related to dependency changes.

@psmedley Could you pls. try to modify the rule and save it? Is the issue gone after that? (This was reported to resolve the issue in the other forum thread).

holgerfriedrich avatar Jul 09 '25 23:07 holgerfriedrich

@holgerfriedrich It doesn't look like this is that related to the rule engine itself. It's the call to ActionHandler.execute() that via the ActionHandler implementation in ScriptActionHandler

https://github.com/openhab/openhab-core/blob/2ed2a3aba346eee01bd287f560b9cbec4b51ff52/bundles/org.openhab.core.automation.module.script/src/main/java/org/openhab/core/automation/module/script/internal/handler/ScriptActionHandler.java#L94

...ends up (and fails) here:

https://github.com/openhab/openhab-core/blob/2ed2a3aba346eee01bd287f560b9cbec4b51ff52/bundles/org.openhab.core.model.script.runtime/src/org/openhab/core/model/script/runtime/internal/engine/ScriptEngineImpl.java#L77

The whole Xtext/Xtend stuff is unfamiliar to me, and I don't know exactly where Guice comes in, but none of the "directly involved" classes seem to have had recent changes.

Nadahar avatar Jul 10 '25 01:07 Nadahar

@lolodomo I am not familiar with the rule engine. I cannot tell if this could be related to dependency changes.

@psmedley Could you pls. try to modify the rule and save it? Is the issue gone after that? (This was reported to resolve the issue in the other forum thread).

I ported my rules to JavaScript so don't have the problem anymore. I probably have a backup I could restore on another machine to test. Will see if I can do this tonight

psmedley avatar Jul 10 '25 01:07 psmedley

But, there have been done some things related to Action in 5.0, some Actions are now available from MainUI. Could there be something in those changes?

It was introduced in OH 4.3, not OH 5.

lolodomo avatar Jul 10 '25 04:07 lolodomo

Maybe a starting issue like UI rules being loaded before the DSL script engine is really ready?

lolodomo avatar Jul 10 '25 04:07 lolodomo

Injector is created by call of https://github.com/openhab/openhab-core/blob/2ed2a3aba346eee01bd287f560b9cbec4b51ff52/bundles/org.openhab.core.model.script.runtime/src/org/openhab/core/model/script/runtime/internal/engine/ScriptEngineImpl.java#L71

I see there were changes by you @holgerfriedrich on this file but I don't find evident problem with your changes in that class: https://github.com/openhab/openhab-core/commit/40d079ae809d81376321e002b39707bab4dc6944#diff-087fc71cc28c4a2cdb4cef346e2c55603eab23487ee163cba639e2115c7ac2b8

lolodomo avatar Jul 10 '25 05:07 lolodomo

The xtend file itself was also modified by you @holgerfriedrich by this PR: https://github.com/openhab/openhab-core/commit/52e1185e413dd18f50adcd35a0f87cbdc77d8afc

The changes in our xtend file are fine but in that PR I also see ".initializeLater" removed in ScriptJvmModelInferrer class, I don't know what it was and what is the consequence of this change ?

lolodomo avatar Jul 10 '25 05:07 lolodomo

@lolodomo I remember that I compared the whole output folders for this PR. Let me check that again, just for this specific change....

holgerfriedrich avatar Jul 10 '25 05:07 holgerfriedrich

I also found PR #4606 from @jimtng that updated DSLScriptEngine.

lolodomo avatar Jul 10 '25 06:07 lolodomo

@lolodomo PR #4578 was merged Feb 15, before M1. I just looked at the generator output in xtend-gen and src-gen folder in org.openhab.core.model.script.runtime if I change back to the deprecated initializeLater (I did not roll back the full PR).

Image

This is the only difference in folder xtend-gen, and seems to follow the recommendation on the deprecation list: https://archive.eclipse.org/modeling/tmf/xtext/javadoc/2.8/org/eclipse/xtext/xbase/jvmmodel/IJvmDeclaredTypeAcceptor.IPostIndexingInitializing.html#initializeLater(org.eclipse.xtext.xbase.lib.Procedures.Procedure1)

src-gen is folder is unchanged.

Maybe I can do another diff later which includes the other changes since 4.x as well.

holgerfriedrich avatar Jul 10 '25 06:07 holgerfriedrich

Maybe someone should try to reproduce the problem first.

lolodomo avatar Jul 10 '25 06:07 lolodomo

@florian-h05 [...]: you are probably the only ones who made changes regarding rules engines in OH5. Any idea what could be wrong suddenly?

I haven't made changes to the rule engine in 5.0, just added a new script extension. Unfortunately I am no rules DSL user either since many years (everything JS Scripting), so I can't help debugging/testing.

florian-h05 avatar Jul 10 '25 10:07 florian-h05

I'm limited to guessing here, since I don't know the inner workings of Xtext/Xtend. But, can someone confirm whether or not the DSL rules are "precompiled" and cached or not? The reported "solutions" are to delete the cache (for some) or re-save the script (for some). This sounds strange to me, because I wouldn't expect either of these things to matter. I would imagine that if DSL rules are "precompiled", that would happen during every restart/rule reparse.

Thus, this could come down to some kind of timing problem - what is and isn't ready by the time the rule is either "compiled" or the action is executed, depending on if there is any "precompilation".

I do use DSL rules on my production system, but that doesn't help - my production system runs 4.2.3, and upgrading isn't trivial since the system is 32 bit. Trying to copy my rules to a 5.0 dev installation won't work either, because the involved Items etc. doesn't exist there - and can't easily be moved or mocked.

Anybody can dump some DSL rules in any installation though, regardless of if this is what they use in their production system. It's hard to know if this will actually impact all DSL rules or not, or if specific circumstances need to exist to trigger it. If it's timing related, it could also be that you'd have to run OH under Karaf to get the right circumstances.

It would be helpful if somebody has a relatively simple rules that trigger this problem, so that it doesn't require simulating a lot of circumstances to reproduce the problem.

Nadahar avatar Jul 10 '25 11:07 Nadahar

I tried to make a simple mock in a test installation, I installed my own binding (since I have a device it can connect to), created an Item not linked to anything, and then created this rule

rule "Test"
when
	Item Mill_Panelovn_Operasjonsmodus changed from OFF
then
	logInfo("Rule","Enabling indicator")
	// Enable indicator
	test.postUpdate(ON)
end

This isn't enough to reproduce the problem, I can see no trace of this error when restarting this installation.

Maybe the rule needs to be triggered during OH startup?

Nadahar avatar Jul 10 '25 14:07 Nadahar

@Nadahar : I believe there is no problem with DSL rules defined in a file. As an example, all my rules are working well. The problem concerns rules defined in UI (or maybe rules calling a script?).

From what I've seen - any rules with scriptactions with rulesdsl are impacted.

lolodomo avatar Jul 11 '25 02:07 lolodomo

@Nadahar : I believe there is no problem with DSL rules defined in a file. As an example, all my rules are working well. The problem concerns rules defined in UI (or maybe rules calling a script?).

From what I've seen - any rules with scriptactions with rulesdsl are impacted.

I've made a managed rule with a scripted DSL Action, but I get the same result: No error. There must be something more needed to trigger the problem.

From working with rules, I know that once the rule has been parsed, it's basically no difference between a file-based and a managed DSL rule. The when part is parsed into one or more Trigger(s), and then then part is parsed into an Action. That's why I didn't really consider managed or not as relevant, but of course, if it's a timing problem, "everything" can be relevant as to whether the issue manifests.

Nadahar avatar Jul 11 '25 03:07 Nadahar

One option would be to add some DEBUG logs to understand in what order things are started when the problem occurs.

lolodomo avatar Jul 13 '25 11:07 lolodomo

One option would be to add some DEBUG logs to understand in what order things are started when the problem occurs.

Yes, if we could get somebody that (still) experiences the issue to do it.

Nadahar avatar Jul 13 '25 15:07 Nadahar

At least we can advice a workaround, updating and saving the failing rule.

lolodomo avatar Jul 13 '25 15:07 lolodomo

At least we can advice a workaround, updating and saving the failing rule.

It's still not clear to me if that is a permanent fix. I find it peculiar if it is.

Nadahar avatar Jul 13 '25 15:07 Nadahar

At least we can advice a workaround, updating and saving the failing rule.

It's still not clear to me if that is a permanent fix. I find it peculiar if it is.

Yes, me too. And that is true that I got no answer to my question whether the workaround survived to a OH restart.

lolodomo avatar Jul 13 '25 16:07 lolodomo

Unfortunately I'm unable to reproduce as I migrated my rules to JavaScript.

Restoring an old backup didn't help either.

psmedley avatar Jul 13 '25 21:07 psmedley