seqcli Ingesting logs from `journalctl` as json, and parsing timestamps

I was pointed at seqcli which apparently could be used in a tailing fashion with journalctl to read journald logs. I tried it out, and instead of reading and parsing text, I went the route with json parsing (to get all the structured logs goodies).

I came up with this:

journalctl -f -o json | \
  jq -c '. + {
    "@m":.MESSAGE,
    "host":._HOSTNAME,
    "@l":(.PRIORITY|tonumber),
    "@t":(.__REALTIME_TIMESTAMP|tonumber)} | 
    del(.MESSAGE, ._HOSTNAME,.PRIORITY,.__REALTIME_TIMESTAMP)
  ' | seqcli ingest --json

There are a few points here:

The timestamp is rejected by seqcli ingest, with The value of @t on line 1 is not in a supported format.
- I could format the timestamp (unix milliseconds) as some datetime, but I think that's a waste..
Should I lowercase all properties?
I couldn't pass in json unless it was newline delimited, I kinda expected that to work though - compact is better anyways, but maybe some source doesn't come in compact in the future ..
I'll submit this as an alternate to your command in the docs when it's pretty :)

Aug 05 '22 10:08 LordMike

Hi @LordMike!

Thanks for sharing your notes. We'd love to have support for some kind of JSON field mapping in the future, which I think could ease some of this, but yes - right now you'll need to format the timestamp as ISO-8601 to have it accepted :+1:

I wouldn't necessarily rush to lowercase properties, unless you want to view them that way (or alongside other logs with a different convention). Seq doesn't really care which casing style you use :-)

JSON streams without delimiters would be a great extension to seqcli ingest; --json is taken (it'd probably be the best flag to use) but for now maybe we could look at --json-stream as an option?

(Newline delimiters are nice because they make it possible to recover from an invalid event; JSON streams without delimiters are harder to recover if there's some invalid/non-JSON data in the middle of them.)

Thanks again!

Aug 06 '22 07:08 nblumhardt

Ah I see - I'd lump it all together under --json, as a newline delimited json value is just the same json as a pretty-printed (or whatever printed, whitespaces are irrelevant compared to the tokens). Buut, if you can use the newline as a form of recovery mechanism, then that's all good :P

I'm not in a rush to support anything but ndjson.

I'll try formatting it as ISO8601. Good of you to mention the format, it wasn't mentioned in the docs anywhere.

Aug 06 '22 20:08 LordMike

Can't find the docs in github - so how can I improve them? :)

I've ended on this:

journalctl -f -o json | jq -c '. + {"@m":.MESSAGE,"host":._HOSTNAME,"@l":["emerg","alert","crit","err","warning","notice","info","debug"][.PRIORITY | tonumber],"@t":((.__REALTIME_TIMESTAMP|tonumber/1000000|strftime("%Y-%m-%dT%H:%M:%S."))+(.__REALTIME_TIMESTAMP|tonumber/1000%1000|tostring)+"Z")} | del(.MESSAGE, ._HOSTNAME,.PRIORITY,.__REALTIME_TIMESTAMP)' | seqcli ingest --json

This:

Translate the log level to a string using the 7 priority levels defined here
Translate the realtime clock timestamp (in unix microseconds) into an ISO8601 timestamp with milliseconds on it
Renames some properties to match seq well knowns

Future readers should be aware that:

This command makes no attempt at catching all logs - if you restart this command, or reboot and run it after boot, it will not send logs captured between the last run of the command and "now"
Likewise, if something fails in seqcli or else, and it fails to send something - you will miss those logs

Aug 06 '22 20:08 LordMike

Thanks @LordMike. I love a good jq solution.

Aug 07 '22 23:08 liammclennan

Thanks for sharing your notes. We'd love to have support for some kind of JSON field mapping in the future, which I think could ease some of this, but yes - right now you'll need to format the timestamp as ISO-8601 to have it accepted 👍

Actually, I imagined that seqcli would accept and interpret values like the following:

If it's a string, assume ISO8601 (or some other set of formats)
If is a number
- If its 1262304000 < N < 1262304000000, assume seconds (2010-01-01)
- If its 1262304000000 < N < 1262304000000, assume milliseconds (2010-01-01)
- If its 1262304000000000 < N < 1262304000000000, assume microseconds (2010-01-01)
Other formats, like javascript dates (which are really just unix timestamps wrapped in a function call)

I chose the cutoff to be 2010, but any number will do - there will be some overlap though for very early timestamps, where you cannot distinguish between the different precisions of unix timestamps.

Aug 09 '22 20:08 LordMike

Given that the current ingest functionality isn't designed with the kind of sophistication in mind that this sort of feature requires, we might be better off closing this one. I'd dearly love to start building some basic ingest pipeline functionality here though, of course :-)

Just on the CLEF format and lack of PR-able docs, I put this repo and spec together - https://github.com/clef-json/clef-json.github.io - the format is published at https://clef-json.org/.

Oct 20 '22 02:10 nblumhardt

seqcli seqcli copied to clipboard

Ingesting logs from `journalctl` as json, and parsing timestamps

seqcli
seqcli copied to clipboard