openhab-core
openhab-core copied to clipboard
[voice] Unable to identity the last active dialog processor.
When using openHAB to process a voice command and trigger a rule, I want to say a confirmation about whetter the rule has failed or not, but from that rule I'm not able to identity the last active dialog processor to select the appropriate sink to say the message through the correct speaker. I'm not sure what will be the best solution for this.
@lolodomo did you faced this problem?
How can a voice command trigger a rule?
By using the 'Action Template Interpreter' or the built-in 'Rule Voice Interpreter' you can update items to trigger rules, so you can implement custom behaviors, but you are not able to identity in an easy way from that rule which processor (actually I think just the sick and source are relevant) trigger the change, so you can not do further voice interaction using the voice actions as you are not aware where the message came from.
What I'm facing at this moment is: I have implemented a voice action command to play my jellyfin media. I use the 'Action Template Interpreter' addon to catch this voice command "Play $* in $room" and updating an item. From there a rule start its execution get this command search in an index of my jellyfin library the most similar title, turns on the tv of the selected room start the jellyfin app and starts the media. Also have implemented the command "Continue playing in $room" to change the current media from room to room. My problem is that I want to say a confirmation message like "Play whatever on the living room" using the sink from the dialog processor that I'm using at that moment but I'm realice it's not an easy way to do this. I think a situation more people can face in the future and not an edge case.
One way to workaround this at the moment is using different listening items on the different processor and trigger a rule when any of them changes to store the assistant identity in another item, but seems like something that should be easer to achieve.
I think that allowing to configure two extra items in the voice settings to hold the last processor sink/source (and keep those updated "automatically" like the listening item) could be an acceptable/easy solution.
Normally this is the dialogue processor that should talk, not a rule that is indirectly triggered by the dialogue processor. Having extra items would not really be a reliable solution as it would work only if you run a unique dialogue processor. With several running dialogue processors, you could have concurrent dialogues and when your rule is finally run, the last saved dialogue processor is not necessarily the one you expect. I have no good idea how to handle this "context". One idea (which does not fully solved this problem) could be to add a new say method that takes a dialogue processor id as parameter. This method would be callable in any rule. This method will ask the targeted dialogue processor to say something. But this solution does not solve the problem "how to store this processor id to reuse it in a rule".
Thank you for your comments @lolodomo.
You are right my solution is not reliable if the voice manager allows multiple dialog execution. I have in mind since some time ago to open an issue/PR for adding a new option to the voice manager for disable concurrent dialog execution as it something I don't want in my case, and should be handled from the server (To prevent start multiple dialogs in case more than one speaker spot the activation word, like other assistant systems). But event if that is implemented, after reading you I think what I proposed is not a good idea.
And you are right in a normal situation the hli handles the talk.
I came to and idea when you use the word "context". Maybe the RuleHumanLanguageInterpreter and other interpreters that allow updating items can be in charge of adding this info to the item as metadata before sending the item command, that will require a modification of the HumanLanguageInterpreter interface to pass the dialog information into the interpret method. But I think is the most reliable way, because if you read this metadata at the beginning of the rule you have little chance of collisions. WDYT of this?
Edit: Instead of modify the HumanLanguageInterpreter interface, we can create an abstract class (called AbstractRuleInterpreter or something like that) which implements HumanLanguageInterpreter and add there overloaded interpret method, then check on the voice manager whetter an interpreter is an instance of this class, to propagate the dialog info, so interpreters which target rule execution can extend from it.
Maybe the RuleHumanLanguageInterpreter and other interpreters that allow updating items can be in charge of adding this info to the item as metadata before sending the item command, that will require a modification of the HumanLanguageInterpreter interface to pass the dialog information into the interpret method. But I think is the most reliable way, because if you read this metadata at the beginning of the rule you have little chance of collisions. WDYT of this?
Edit: Instead of modify the HumanLanguageInterpreter interface, we can create an abstract class (called AbstractRuleInterpreter or something like that) which implements HumanLanguageInterpreter and add there overloaded interpret method, then check on the voice manager whetter an interpreter is an instance of this class, to propagate the dialog info, so interpreters which target rule execution can extend from it.
To be honest, I don't know if using item metadata to store the dialog processor context is a clean solution or not. It should be discussed with OH core maintainers. Which seems a bit tricky to me is to attach a dialog processor context to the item because you would like to use it in a user rule triggered by this item.
Your problem finally is that, while all the logic should be in the human language interpreter, you would like to dispatch it with one part in the human language interpreter and one another part in additional user rules. That looks very tricky.
As I understand, your need is to have the rule saying a specific feedback. Isn't it something that could do the HLI directly ? Maybe a concrete use case would help to understand. And why not simply using the say action in your rule ?
I'm using the say action but I need to indicate the correct sink.
A simplified version of the problem will be:
I have 2 rooms A and B, in each of those I have one speaker, lets say speaker in room A is registered in openHAB as SourceA and SinkA and speaker in room B as SourceB and SinkB. Both with and active dialog and waiting for the keyword to start. I have configured in openHAB the Rule Interpreter as default interpreter and select as target for it the item ExampleVoiceCommand, then I have created a rule which will be triggered when this item changes. From the rule I'll check the new value for the item, if the text is "good morning" I want to give a response message, mixing different items data like temperature inside, temperature outside... in a human readable phrase, and send this information back to the user though audio. I can speak through a sink using the "say" action inside the rule. The problem is that from inside the rule I don't know if I have to use SinkA or SinkB to respond to the user message, as there is not context of the dialog that originates the item change.
I will update the first comment to contain this example, maybe it simplify things. Thank you for your feedback @lolodomo!!
Is this now closed by #3142?
Yes, ty!