Initial implemenation of room associations and audio processing
This PR implements a new mode of operation: the per-room processing. The idea is that instead of matching a command to run from the message content it is matched against the name of the room in with the message is received. Just as with the "normal" command regex-matching, any message can only be matched against one command (even if multiple regex configs match to the room name).
Another major part of this PR is that not only text messages can be processed (because they are no longer required for command matching), but also audio messages can be "processed". By processed I mean that the (base64-encoded) audio data is passed to the handling command (just as with "regular" commands).
My primary use case: Use matrix as the communication channel from my phone to send messages to matrix-eno-bot running at home in order to do "stuff" in my smarthome environment. And because voice is an easier to use interface when using a phone, it also passes voice messages to the configured commands (which are responsible for STT'ing the audio data).
Hi @juergenpabel
Excellent. Fantastic for this great piece of work.
Is there a type in your text above? Instead of "any message can only be matched against one command" did you mean to say: "any message can only be matched against one room"?
In all cases at most one room will "fire"? The first matching room will "fire" and its configured command will be executed.
Also to confirm: For a given room X always the same command will be executed, correct?
bot_commands.py line 47 has a typo (wrong class name in comment)
Why a separate rooms.yaml file? Now there are 2 config files: rooms.yaml and commands.yaml.
Can't the rooms.yaml file become the new version of the commands.yaml file?
Simpler? Less confusion?
If the rooms.yaml is rename to commands.yaml, then if the "rooms-config-section is all commented out it works as now, and for people that want to take advantage of your new code, they uncomment and adjust the rooms-config-section of the file.
Does that make sense?
If there is no room-match found then it will do command-match.
What do you think @juergenpabel ?
Is there a type in your text above? Instead of "any message can only be matched against one command" did you mean to say: "any message can only be matched against one room"?
Yes: The essential content (text/audio) message handling (up to the point of what to do with it) is left as-is, so if a message is received, it checks against the room_dict elements' regex and the first match handles that message. So the command (cmd config) is passed that message for handling.
In all cases at most one room will "fire"? The first matching room will "fire" and its configured command will be executed.
Yes...and maybe no (only rare corner case): I think in matrix it is possible to send a single message (identified by its ID) to multiple rooms, so that one message (with the same ID) would appear in multiple rooms. If more than one room is part of the room-config here, I think this "one" message will fire multiple times - one time for each room. I haven't tested that, though.
Also to confirm: For a given room X always the same command will be executed, correct?
Yes. So for example, I have one room configured to run an openhab script, passing the input to openhab's API (this one wouldn't work with voice messages because my client doesn't handle that). Another room is for passing (primarily voice-messages) to a smart home digital assistant (https://github.com/kalliope-project/kalliope) - so that command invokes kalliope API with voice data for voice messages (kalliope speech-to-text's that first) or passing the content from a text message, so that kalliope skips the speech-to-text'ing. In any case, kalliope does what it's told (hopefully) - and returns the output (like: OK) and that's than the bots answer to the request).
This setup allows me to access my (FOSS based and entirely non-cloud) smarthome from my phone when not at home without exposing any services to the internet (think TCP port exposure). Given that I use private matrix rooms with encryption/signatures and trust-level verification, I get the best of both worlds (local and remote) but without exposing any of my infrastructure.
Why a separate rooms.yaml file? Now there are 2 config files: rooms.yaml and commands.yaml.
The regex parsing in commands.yaml is applied to the message, the regex from rooms.yaml is applied to the room from which the message was received. Technically, we could merge those configs - but semantically, they are implement different behaviour.
Can't the rooms.yaml file become the new version of the commands.yaml file? But then you'd loose the do-as-i-say behaviour that was originally implemented. I think for many users that's really a more approachable usage scenario.
Simpler? Less confusion? See above: technical vs. semantical view
If the rooms.yaml is rename to commands.yaml, then if the "rooms-config-section is all commented out it works as now, and for people that want to take advantage of your new code, they uncomment and adjust the rooms-config-section of the file. Yes, and they can both be used at the same time (at least as long as the combined logic doesn't cause any unexpected behaviour.
Does that make sense? I think - but should be able to defer that from my answers (if they actually answer - or at least address - your questions or not)
If there is no room-match found then it will do command-match. Yes.
Yes...and maybe no (only rare corner case): I think in matrix it is possible to send a single message (identified by its ID) to multiple rooms, so that one message (with the same ID) would appear in multiple rooms. If more than one room is part of the room-config here, I think this "one" message will fire multiple times - one time for each room. I haven't tested that, though.
And on second thought: This wouldn't be the case if room encryption is used - so only in case of unencrypted rooms (if at all, I haven't cross-checked the matrix spec - but I think to remember this from reading the spec quite some time ago). I might be wrong after all - but this explanation was more about the handling logic and not about any specific behaviour anyhow.
Love your use-case! Very nice. And your PR certainly adds value and new features ... I am very happy. :heart:
Still I have a couple of questions, just to clarify my thinking:
In the current PR, if there is no matching room, the commands-section will be used from the commands.yaml file. Correct?
In the current PR, if there is a matching room, the rooms-section from the rooms.yaml file will be use, and the commands-section of the rooms.yaml file will NOT be used. Correct?
Furthermore, am I correct in saying, in the current PR, the commands-section of the rooms.yaml file will NEVER be used?
If the answer is "YES" then it seems to me that clearly there should be only 1 yaml file: having both the commands-section (used when no room matches) and the commands-section (used when no room matches).
In the current PR, if there is no matching room, the commands-section will be used from the commands.yaml file. Correct? Yes, the handling logic in Command.process() is now
- if
- else if
- else if
- else
In the current PR, if there is a matching room, the rooms-section from the rooms.yaml file will be use, and the commands-section of the rooms.yaml file will NOT be used. Correct?
Ahhh - I think I now understand why you're asking: in rooms.yaml.example I (after the rooms config example) did not remove the command configuration. You are asking what's the purpose of that configuration, right? I just didn't remove it after copying the commands.yaml.example to rooms.yaml.example. I'll change that. Please confirm, if this is the "cause" of your questions - this one and some of the previous ones (or else explain further what we're discussing).
Furthermore, am I correct in saying, in the current PR, the commands-section of the rooms.yaml file will NEVER be used?
Yes, see above.
If the answer is "YES" then it seems to me that clearly there should be only 1 yaml file: having both the commands-section (used when no room matches) and the commands-section (used when no room matches).
That would indeed than be an alternative implementation approach. Do you prefer to "merge" both config "worlds"? In any case, let me "correct" the rooms.yaml.example first - so that than merging both into one would be a cleaner merge.
Yes, I was puzzled why there is a commands-section (after the rooms-section) in the rooms.yaml file. Yes, this is what "confused" me.
I think now we look at this from the same angle.
That would indeed than be an alternative implementation approach. Do you prefer to "merge" both config "worlds"? In any case, let me "correct" the rooms.yaml.example first - so that than merging both into one would be a cleaner merge.
Yes, I prefer 1 (merged) config file, that has both sections: first room-section (as this is the first priority) and then below a commands-section (as this is the second priority after room).
I think 1 yaml config file is better, because simple, all visible in a single file, 1 file less, etc, etc.
What do you think @juergenpabel ?
I think 1 yaml config file is better, because simple, all visible in a single file, 1 file less, etc, etc.
What do you think @juergenpabel ?
Honestly: I don't care - it's left-and-right alternative with no obvious "better" option. You prefer the merged config, so let's do that.
But: I just looked at the provided example config + scripts - the things like alert.sh, backup.sh and such. What would be a good example config+ scripts for the room config? from the existing scripts, only the RSS one appeals to me - but with a different "usage"; instead of preconfigured RSS feeds ("affaires","andreasm", ...) this script should fetch feed items based on a RSS url provided via a message. So this - let's call it "room-rss.sh" script would get passed a URL (taken from a received message) and then respond with the feed items. The other scripts really don't have such a parameterization that would be actually useful to users.
So: Do you want a functional rooms config in the .example file? In the sense that the invoked scripts should actually do what's documented in the example file? I'm asking because:
- I can't come up with good example functionality to implement that is generic (doesn't require anything on the user's side like an openhab server) and useful at the same time. The rss case is the best (generic one) that I can come up at the time.
- I wouldn't want to implement/adapt these "example" scripts - that tends to be more work than just a quick copy+edit to make it actually useful+funcational
I think the example does not need to be functional. It should just be well documented. The reader should be able to say "ah, ok, I get it".
By default, the rooms-section will be commented out, right?
Also: any user would have to join their bot to the configured rooms in order to get a working setup. The only "generic" option here would be to provide some public example rooms created by you (as the project maintainer) - but that would be a security risk par-excellence: anyone in this public room would be able to send commands to the user's bot instance...if they just used the provided example config.
And: Right now no "join this room from the room config" logic is implemented. The current logic only matches against the rooms display-name, however the bot has joined that room is currently out-of-scope. Implementing that part would be easy - but would it be really reasonable to do that (=> security-risk)? I have my doubts...
One dumb idea for an example would be : room name: archive-room command: add_message_to-archive.sh
and of course the script add_message_to-archive.sh is NOT implemented.
Also: any user would have to join their bot to the configured rooms in order to get a working setup. The only "generic" option here would be to provide some public example rooms created by you - but that would be a security risk par-excellence: anyone in this public room would be able to send commands to the user's bot instance...if they just used the provided example config.
And: Right now no "join this room from the room config" logic is implemented. The current logic only matches against the rooms display-name, however the bot has joined that room is currently out-of-scope. Implementing that part would be easy - but would it be really reasonable to do that (=> security-risk).
User must manually set up rooms, join rooms, etc. Agreed.
Another dumb idea for an example would be : room name: rss-feed-room command: add_message_to_rss_feed.sh
and of course the script add_message_to_rss_feed.sh is NOT implemented.
One dumb idea for an example would be : room name: archive-room command: add_message_to-archive.sh
and of course the script
add_message_to-archive.shis NOT implemented.
After thinking about it for a moment: the "logical" common denominator to the rooms feature is that the command's logic is to pass the message (command) on to another system. Like to openhab or kalliope in my setup. It's more of a proxy-like setup than a direct do-something-useful-yourself setup.
Yes, use your own use case, something like:
room name: audio-room command: run_openhab_to_process_attached_audio.sh
and of course the script run_openhab_to_process_attached_audio.sh is NOT implemented.
With a few examples like this AND some additional comments I hope the reader/user will get the idea of what can be done.
With a few examples like this AND some additional comments I hope the reader/user will get the idea of what can be done.
Yes - and with the conclusion that users must do quite a bit of setup-work to get a functional setup (create room, join bot, create/adapt scripts and create that matching config), a separate rooms.yaml.example wouldn't actually be an example config, because it wouldn't actually have any config (only documentation); so...joining this feature into commands.yaml (and thus and adapted commands.yaml.example) would be better - because that example config file could still be used as a starting config file (just without the rooms feature for the aforementioned reaons).
I agree to your above stmt.
2 more examples occurred to me:
room name: english-to-french-translation-room command: translate-en2fr.sh
room name: speech-to-text-room command: convert-audio-attachment-to-text.sh
room name: english-to-french-translation-room command: translate-en2fr.sh
That's an excellent room use-case - and also one that can be easily "multiplied": one room for translation to french, another one to german, etc. If we now had a translation service that was usable without registration (in order to minimize the additional setup work required)....
I found: https://libretranslate.com/ At least for sporadic usage, their API appears to be usable without an API key. I think we just found a very good example, because the config+scripts could easily be created into a functional state. The only setup work for a user would be the room creation and bot joining. I like.
room name: speech-to-text-room command: convert-audio-attachment-to-text.sh
Also a good one - I am not aware of any "open" (without registration) online translation services, though.
I have another aspect to discuss: The invocation of the commands/scripts. Your implementation (non-room) is to pass the data (message content) to the script as a command line argument, obviously with audio data that wouldn't work. That's why I adapted that behaviour to write the (base64-encoded audio) data for rooms to STDIN of the executed command.
The question here would be whether how data is passed on to the script should be different for non-room commands and room-commands. I would prefer that they'd be the same - but that would require a rework of a) all the provided example scripts and b) any existing user installations. Of cource we could make this behaviour configurable and the default for non-rooms would be by execution argument. In the long run however, I'd find it more reasonable to just use one mechanism (at least by default).
And another minor aspect: My current changes in the Command class don't differentiate between room and non-room invocations, so the message content is written to STDIN in Command._os_cmd():
run = subprocess.Popen(
argv_list, # list of argv
+ stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
universal_newlines=True,
env=new_env,
)
+ run.stdin.write( self.command )
That's a side effect for existing scripts, but because they don't expect anything on stdin anyhow (is stdin even closed() if not passEd by kwarg to Popen?), it shouldn't be an issue. Honestly: I haven't tested that with your example scripts, though.
While looking at the source (in preparation for merging rooms into command) I noticed that I hadn't implemented audio processing in non-encrypted rooms - now it is (and: generalized the media handling, so that not only OGG can be passed on to the scripts - they'd have to handle it though according to the given mimetype)
Re: STDIN vs command-line-argument
I had not thought about that. Yes, you are correct, this is an issue.
We should definitely be backward compatible. No changes to current users!
Only way forward: support both. Input via a) STDIN and b) via command-line-argument.
I think the user should chose. Just like the YAML file has the following for each room and each command:
markdown_convert: false
formatted: true
Let's add:
# true ... input is provided via stdin pipe; false...input is provided as command line argument
input_via_stdin: true
If the default is input_via_stdin: false then current users have nothing to do.
In rooms: that e.g. do audio processing, this can be set to input_via_stdin: true.
But other rooms, e.g. that do text-translation this could be set to input_via_stdin: false.
What I am trying to say is: Not all users that use rooms need to get input via STDIN.
What do you think?
I know, I know, sorry, this is getting more and more work for you. But, at the same time your PR is getting also better and better :smile:
I know, I know, sorry, this is getting more and more work for you. But, at the same time your PR is getting also better and better smile
As long as it remains "planable" (I can forsee the amount of work required), I'm fine with it - that's why I wouldn't want to adapt all the example scripts as mentioned earlier. And yes, I think the improvements are well worth it.
But I just realized another "complication": in the rooms configuration (rooms_dict.py) I added an args config option. The idea is that because the "data" is passed via STDIN, that configuration data can be passed by parameter. Like for the openhab script the IP address of the server or auth data. In terms of enabling non-environment-specific scripts (with hardcoded IP adresses, etc), some form of config data passing must be available. Via parameters appears to be the most common approach - another alternative would be to implement some form of protocol for STDIN, but that would induce more parsing work for script creators to implement.
So either the data is appended to any configured parameters like so:
openhab.py --server 192.168.1.1 "Bedroom_Light=off"
where "Bedroom_Light=off" was the message content and "--server 192.168.1.1" was provided in the config for that room/command/script.
Technically, I'd be fine with appending the message data to other (configured) parameters OR passing it via STDIN; from a users perspective I think it'd be more approachable, if only one default mechanism existed for each type of config (non-room vs. room): and with audio (or even other mimetypes) that would mean STDIN for rooms. For non-rooms that question doesn't arise because the message content is used to map to the command (whereas in room scenarios that mapping is done via the rooms name).
But I just realized another "complication": in the rooms configuration (rooms_dict.py) I added an args config option.
I did not see that in the source code given. That is in your personal local version?
I did not see that in the source code given. That is in your personal local version?
Take a look at RoomDict. get_opt_args() -> in Command.process() the invocation of _os_cmd() for rooms has kw args=self.room_dict.get_opt_args(matched_cmd) whereas for non-rooms it is args=self.args.
I think the simplest approach is the best option, certainly for now.
OK, the following will NOT work, it will give error (if no special parsing to cmd is added). It will look for an executable named openhab.py --server 192.168.1.1 Bedroom_Light=off
#rooms:
# room1:
# regex: "^Room: 1$|One$"
# cmd: "openhab.py --server 192.168.1.1 Bedroom_Light=off"
# markdown_convert: false
# formatted: true
# code: false
#
How about this? A list of args in yaml for the extra arguments.
#rooms:
# room1:
# regex: "^Room: 1$|One$"
# cmd: openhab.py
# give 3 extra arguments to this command
# extra-args: ["--server", "192.168.1.1", "Bedroom_Light=off"]
# markdown_convert: false
# formatted: true
# code: false
#
The code must then append this extra-args list to the real argument list before the os_cmd() call.