matrix-commander
matrix-commander copied to clipboard
Enhancement: print the source event, output data or event in JSON instead of plain text format
--listen
and --tail
print events in a human-readable format. I want the json, to parse it easily in my script. Could matrix-commander print the json by default or with a command-line flag?
Local patch
On my test machine, I replaced line 2053
print(complete_msg, flush=True)
with
print(json.dumps(event.source))
This prints one line for each event. My script reads each line and parses with json.loads()
.
Does event.source
always esist in all event types received from the callback function? I'm asking because the change works on my machine for my limited, interactive usage, but I'd like to be sure for automated usage and for sharing the patch with other people.
Note on newlines: json.dumps() defaults to indent=None
which selects the most compact representation
(in reference to newlines) according to the docs
To parse the lines from matrix-commander I am using:
def get_latest_events(*, room_id: str, tail: int=100) -> list[dict]:
"""
Requires this patch to matrix-commander::
https://github.com/8go/matrix-commander/issues/92#issuecomment-1232843897
:return: Events from the latest to the oldest.
"""
import collections, json, subprocess
args = ['matrix-commander', '--tail', str(tail), '--listen-self', '--room', room_id]
mc_stdout = subprocess.run(args, capture_output=True).stdout
mc_lines = mc_stdout.decode('utf-8').splitlines()
return [json.loads(line) for line in mc_lines]
I've made some very basic implementation which uses the flag --output-json
It's not yet incomplete but I was driven by the same idea wanting to have JSON output.
Maybe with a bit of work this can be turned into a PR
https://github.com/HoboDev/matrix-commander/tree/feat-output-json
@HoboDev That's unrelated. This issue is for the data structure returned verbatim from the server. A Matrix room is a list of json objects that follow the Client-Server API specification and matrix-commander has the json inside event.source
.
A client should use the Client-Server specification to parse matrix-commander output into anything.
Instead your code parses the standard json into a non-standard json with limited information. This is a maintainance nightmare in the long term, because it ties the downloader with the parser, the Client-Server spec is changing, and anyone will request their fields of interest.
@HoboDev Could you write a library to parse the Client-Server json (= wanted matrix-commander output) into a high-level data structure (list[Message]
where each Message has text_revisions
, reactions
/annotations
, in reply to
, ...)? I don't know one for Python.
I've replaced the very same line you have changed to get json output and added some more code. Looking at the keywords "json output" I fail to see how this isn't related. Aren't we talking about json output as the common denominator?
What I did is more like a rough draft or proof of concept since I didn't know which specification to follow and didn't have time for going into details. I don't recall that the code I've added parses anything that must be the existing code. I'm just using some variables to get the info and put it into a structure. Of course this can always be changed to a standard. Which obviously is more sensible.
Unfortunately I've seen your issue after I was done with my draft. Would have been easier to start like this.
I guess I could write a library but neither am I well versed in python nor do I have the enough free time for digging into this. Maybe as I tag along and more knowledge builds up the path will shorten. For this endeavor we'd need the desired output from matrix-commander first anyway. Also it would make things a lot easier. But currently I just need the json output and am looking for a sensible solution to implement that.
So what is your recommendation? Just outputting all json data as it arrives?
That shouldn't take too long and would be quite an easy pull request if it's accompanied by the right flag and documentation.
What about matrix-commanders info messages? I've rerouted the first line to stdErr. The same could be done for all info messages (if there are any, e.g. --debug
messages).
Another thing just came through my mind (because I don't really know the spec and didn't have time to look at the resulting data):
Wouldn't it be more sensible to output the whole event?
Let's talk flags... E.g.
--print-raw-event
--print-raw-event-source
Then we could have both. The spec would remain unchanged and the next program could sort it out.
Hi @opk12 and @HoboDev
Thank you for your valuable input.
You make valid points. I also see that there are several issues, all somewhat related.
Let me think a bit about it, maybe the best is an incremental approach. Get the JSON event data published for listening, thereafter step-by-step add other event/JSON related output issues.
I want to centralize all JSON-related, event-data, raw-data, output-format etc. discussion in one place, here in Issue #92.
Hence I closed Issue #95.
@HoboDev , @opk12
Please check out the new release and the --output
feature, currently implemented for --joined-members
and --joined-rooms
.
Test it and give feedback.
Is that what you expected? Can it be improved?
@HoboDev wrote this in Issue #94
I have started a basic implementation of --output-json to get JSON on stdout, so I don't have to deconstruct the message lines. You can check it out here: https://github.com/HoboDev/matrix-commander/tree/feat-output-json
It is lacking a bit on documentation and maybe does not adhere too much to existing variable names. But at least it's a start. So far I've covered basic messages and media files.
I moved it over here because I want all the discussion to be in the same place, here in Issue #92.
A question I have: --joined-members "*"
in text mode produces multiple lines as output, one line for each room.
In JSON, should that be
- 1 line with 1 JSON array (with N objects in it) or
- N lines each one with 1 JSON object?
If you run m-c
now with --joined-members "*"
you will see that it returns 1 line with 1 JSON array (with N objects).
Is that the best?
Same question will come up once received messages are done. Should it output 1 line with 1 JSON object for each message read, or just a total of 1 line with a JSON array of all N messages?
What do you think? I was leaning towards just returning 1 single array object. The user can then use jq
or something else to go thru the array.
I'd follow the standard Unix practice of printing one object on one line.
- For
--joined-members
, the object of interest is a member, and there is no pro in emitting JSON instead of@user:homeserver
, unless you plan to add fields in the future and document that the returned JSON may have fields added. Edit: then a separate call can get the user info from the ID. - For messages, the object of interest is the event as returned from the homeserver.
The purpose of the Unix way is that a client can read an object as soon as it's downloaded. Otherwise
- a client that only needs some objects, or decides to terminate sooner (error etc), must wait to download everything, and fails for a network failure while downloading unnecessary events
- it is inefficient to parse a very huge string (many MB) then navigate through a very huge array. It can also be painful when the system is low on memory or on a low end device (think Mobian).
- The standard Unix tools can't be applied, you must be a programmer (not just a power user) and learn jq, and not all systems have jq available. For example, with the patch above that prints one json per line, I am filtering the jsons with
grep
andhead
(no parsing) to get all the mxc:// URLs for a specific room.
The Unix way also has the nice property of SIGPIPE: matrix-commander | client
is optimized by the OS, by terminating matrix-commander
as soon as client
terminates. So the download is stopped automatically as soon as client
gets the necessary material. This requires that messages are sent to client
as soon as they are available.
@opk12 thanks for the input and feedback.
Anyone else, any more feedback?
Option 1 , currently implemented, output is 1 line with 1 JSON array (with N objects in it)
matrix-commander --joined-members "\!rrr:xxx.sytes.net" "\!sss:xxx.xxx.xxx" --output rAW | jq
[
{
"members": [
{
"user_id": "@u1:xxx.xxx.xxx",
"display_name": "u1",
"avatar_url": "mxc://xxx.xxx.xxx/zzz"
}
],
"room_id": "!rrr:xxx.xxx.xxx"
},
{
"members": [
{
"user_id": "@u1:xxx.xxx.xxx",
"display_name": "u1",
"avatar_url": "mxc://xxx.xxx.xxx/zzz"
},
{
"user_id": "@u2:xxx.xxx.xxx",
"display_name": "u2",
"avatar_url": "mxc://xxx.xxx.xxx/zzz"
}
],
"room_id": "!sss:xxx.xxx.xxx"
}
]
Option 2 , currently not implemented, output is N lines each one with 1 JSON object; preferred by @opk12 as explained above
matrix-commander --joined-members "\!rrr:xxx.sytes.net" "\!sss:xxx.xxx.xxx" --output rAW | jq
{
"members": [
{
"user_id": "@u1:xxx.xxx.xxx",
"display_name": "u1",
"avatar_url": "mxc://xxx.xxx.xxx/zzz"
}
],
"room_id": "!rrr:xxx.xxx.xxx"
}
{
"members": [
{
"user_id": "@u1:xxx.xxx.xxx",
"display_name": "u1",
"avatar_url": "mxc://xxx.xxx.xxx/zzz"
},
{
"user_id": "@u2:xxx.xxx.xxx",
"display_name": "u2",
"avatar_url": "mxc://xxx.xxx.xxx/zzz"
}
],
"room_id": "!sss:xxx.xxx.xxx"
}
I changed the code to follow @opk12 advice. Now the Option 2 is implemented.
Now a separate line and a separate JSON object is printed for each data-set (e.g. message received, room-member found, etc.)
Test it with --listen
, --tail
or --joined-members
and --joined-rooms
.
E.g. the new code delivers this:
matrix-commander --joined-members "\!rrr:xxx.sytes.net" "\!sss:xxx.xxx.xxx" --output raw | jq
{
"members": [
{
"user_id": "@u1:xxx.xxx.xxx",
"display_name": "u1",
"avatar_url": "mxc://xxx.xxx.xxx/zzz"
}
],
"room_id": "!rrr:xxx.xxx.xxx"
}
{
"members": [
{
"user_id": "@u1:xxx.xxx.xxx",
"display_name": "u1",
"avatar_url": "mxc://xxx.xxx.xxx/zzz"
},
{
"user_id": "@u2:xxx.xxx.xxx",
"display_name": "u2",
"avatar_url": "mxc://xxx.xxx.xxx/zzz"
}
],
"room_id": "!sss:xxx.xxx.xxx"
}
Have a look at the new release. --output
now covers --listen
and --tail
.
Try something like --tail 1 --listen-self --output json-max | jq
or --tail 1 --listen-self --output json | jq
@opk12 Your 20-line Python program (posted above) to read messages should work. I have not tested it, but please report back.
Wow, I was literally looking for how to do this and saw your release 4 hours ago. Thank you!
@8go My use case is specifically about having spec-defined events:
- when the spec changes, I should not wait for my program to break, report on GitHub, wait for m-c to implement, test the m-c update (pray without bugs), re-deploy in production
- programmers should not try the option to learn what is output, or guess about what 'close to matrix-nio data' can mean
- programmers need the guarantee that a future implementation does not change the output format
In other words, my use case is specifically about printing the source event unaltered, as returned verbatim from the Matrix homeserver without any kind of manipulation. It's not about collecting the fields and making a new dict, it's about treating the matrix-nio event object as opaque and passing it on as-is to the client.
Another point is guaranteed interoperability with other libraries in the Matrix ecosystem, i.e.
- it's not about having the fields in the output, or my personal trust in the m-c implementation quality, it's being able to prove the correctness of my client code to the future myself / dev team lead / ... by quoting the m-c manpage that says that the output follows the spec
- if I copy a function from another FLOSS library into my software, it will manipulate spec events, and passing m-c output to it should work de jure (not as a de facto "coincidence")
To be clear, experience shows that, if one needs to touch the Matrix events because an easy interface is not sufficient, learning the spec is the best favor they can do to themselves in the long term.
OK, @opk12 , your point is valid. But ...
Let's take an example: --get-room-info
. --get-room-info
a) gets the room object from matrix-nio
and
b) adds display_name
obtained via matrix-nio API call display_name
and
c) outputs the combined Json object (which includes the display_name
)
Observation:
a) Does the matrix-nio
give me a room object the follows the official Matrix spec? Hopefully? But I do not know for sure, I have not compared the matrix-nio
code with the spec.
b) display_name
is not in the matrix-nio
object. There is name
. There is group_name
, etc. Calling the display_name
method in the nio API, it computes the display_name
based on name
, groupname
, etc. It seems convenient to have the display_name
readily available without having to making another m-c
call.
In summary,
- there are a few instances where one or two fields are added (like
display_name
in the example above). - nothing is ever removed, except
transport_response
forjson
becausetransport_response
is not not the data, it is about how the data was obtained and transported.
How would you improve that?
The description now reads:
--output OUTPUT This option decides on how the output is presented. Currently offered choices are: 'text', 'json' and 'json-max'.
Provide one of these choices. The default is 'text'. If you want to use the default, then there is no need to use
this option. If you have chosen 'text', the output will be formatted with the intention to be consumed by humans,
i.e. readable text. If you have chosen 'json', the output will be formatted as JSON. The content of the JSON
object matches the data provided by the matrix-nio API. In some occassions the output is enhanced by having added
a few data items for convenience. These convenient data items are added to the data from matrix-nio. In most cases
the output will be processed by other programs rather than read by humans. Option 'json-max' is practically the
same as 'json', but yet another additional field has been added. The data item 'transport_response' which
gives information on how the data was obtained and transported is being added.
Props for the big work and effort. I am following the releases and I see the pervasive code changes.
I personally need a room's events (--listen
and --tail
) for piping into a spec-compliant parser. This was lost in the thread above, sorry for not clarifying that I referred to the room events today.
This is just a 2-line change, Callbacks.message_callback()
doing if gs.pa.output == '--speced-raw-event-from-homeserver': print(json.dumps(original event))
(without obj_to_dict
and custom fields, to spare me a compatibility layer around m-c that depends on m-c implementation details).
How about a --output json-spec
that is equal to json
(for the time being) except for the message callback.
?
I kind of foresaw that some day something like this would happen (I didnt think it would happen so soon, hahaha), that is why it is not a binary on-off --json
but a --output many-sub-options
.
@opk12 try out --output json-spec
Thank you! Here are some points.
-
Is there a reason now for
json-max
andjson
to be separate? -
Shouldn't
json-spec
be defined in the help text in terms of the C-S spec? Non-programmer power users, or programmers who know Matrix but not Python, won't even try it if they read "matrix-nio". -
json-spec
does not really make sense for--get-room-info
and other queries that are not for room events. It feels weird to write-spec
and get stuff from matrix-nio (one of many libraries).
About -spec
: Reading the commit, it feels like having an umbrella "matrix-nio stuff" option for conceptually different things (room events and matrix-nio objects) could be caused by my other feature request, of calling the m-c functions to avoid reimplementing the same snippets around matrix-nio. But it's really as simple as that (= function call) because serialization (= json.dump of matrix-nio objects) is contrary to the feature:
- deserializing requires me more effort than calling or copy/pasting a snippet from matrix-commander
- while the output is understandable by a human, it is unusable programmatically, because
__dict__
shows the implementation-dependent private fields (so my app cannot rely on it) and because__dict__
won't output properties (= fields that are not stored in the object, but computed on every call) - proper de/serialization needs dedicated methods in matrix-nio (merely exchanging fields won't make it)
Maybe there is some use case that I am missing, but why printing matrix-nio objects at all in the -spec
case.
- Some find it convenient to shorten
if a == A1 or a == A2
to the idiomaticif a in (A1, A2, A3)
@opk12 Regarding:
- Is there a reason now for json-max and json to be separate?
Only reason is that json-max
also gives transport-response
but fwe people will care about that
- Shouldn't json-spec be defined in the help text in terms of the C-S spec? Non-programmer power users, or programmers who know Matrix but not Python, won't even try it if they read "matrix-nio".
Yes, It would make sense to define/document json-spec
as "according to specification....".
- json-spec does not really make sense for --get-room-info and other queries that are not for room events. It feels weird to write -spec and get stuff from matrix-nio (one of many libraries).
Are you suggesting that json-spec
should only print the message events in --listen
and --tail
but be silent (print nothing) in all other calls (e.g. --get-room-info
) ? The message event when receiving messages is the only "as-spec" event, correct? Or is there another one ?
4> Some find it convenient to shorten if a == A1 or a == A2 to the idiomatic if a in (A1, A2, A3)
Thanks for pointing that out, easy to fix.
Question: --tail 10 -y --output json-spec
: This is working for you? Yes/No? Working as you expected?
- json-spec does not really make sense for --get-room-info and other queries that are not for room events. It feels weird to write -spec and get stuff from matrix-nio (one of many libraries).
Are you suggesting that
json-spec
should only print the message events in--listen
and--tail
but be silent (print nothing) in all other calls (e.g.--get-room-info
) ?
I suggest that it raises an error: another option was meant (it is a typo), or, to silence output, the usual shell redirection > /dev/null
would be more appropriate.
The message event when receiving messages is the only "as-spec" event, correct? Or is there another one ?
As far as I know (I studied room events), there are no other kinds of events (the spec says "event" and "room event" interchangeably) or other cases for --json-spec
. Taking your example of --get-room-info
: it merges information from different HTTP calls and I don't know a single Matrix object that represents a room. If the spec adds such an object in the future, it will be the perfect match for --json-spec
, so giving an error now gives flexibility in the future (no backward compatibility to constrain the output).
Question:
--tail 10 -y --output json-spec
: This is working for you? Yes/No? Working as you expected?
Sure, it works and I migrated my room dump script!
Another code simplification is possible. Rather than making the output inside an if, all outputs can be made in advance. Adding other --output
s in the future will not make a lot of unmanageable nested if
s, but just one new line another_output = json_ | ...
For example, this
if gs.pa.output in ...:
dic = resp.__dict__
if gs.pa.output == OUTPUT_JSON:
...
can become (json_
has the underscore to avoid a name clash with the module json
)
json_ = {...}
json_max = json_ | {'new key': ...}
json_other = json_.copy()
json_other.pop('key to remove')
json_spec = ...
print_event_depending_on_option(gs.pa.output, text=..., json_=json_, json_max=json_max, json_spec=json_spec)
def print_event_depending_on_option(option: Literal['text', 'json', 'json-max', 'json-spec'], *, text: str, json_: dict=None, json_max: dict=None, json_spec: dict=None) -> None:
results = {
OUTPUT_TEXT: text,
OUTPUT_JSON: json_,
OUTPUT_JSON_MAX: json_max,
OUTPUT_JSON_SPEC: json_spec,
}
print(results[gs.pa.output])