flux-core icon indicating copy to clipboard operation
flux-core copied to clipboard

idea: log job and node events directly to journald

Open grondo opened this issue 9 months ago • 7 comments

@morrone stopped by for a chat today and suggested an idea for handling streaming events from Flux, i.e. those currently supported by the JournalConsumer interfaces for the job manager and resource module journals.

Chris proposed sending events directly to the systemd journal, presumably using the native protocol. Consumers can then use journald apis and commands to grab events without connecting to Flux, and the persistent problem is (presumably) foisted off to journald.

grondo avatar Feb 11 '25 20:02 grondo

A worry is that dumping cluster RAS/job data into the systemd journal on a management node might fill up the limited journal storage allotment with Flux data and push out other useful things. The end goal presumably wouldn't be to do queries on the journal directly anyway, but to just get it out of there and into something scalable and site-specific.

So isn't the python API for sites to do their own thing that we've already developed arguably the better solution?

garlick avatar Feb 12 '25 16:02 garlick

Your argument does make sense, I don't recall the exact arguments for the journald approach. I had just agreed to open an issue describing the idea.

If the worry is that the backend database may spend significant time down and thus the Python based consumers might have to cache large numbers of events to avoid loss of data due to eventlog truncation or job purges, then nothing is stopping the Python consumer from using the journal as a local store. (Though the same caveat about filling the journal applies)

grondo avatar Feb 12 '25 17:02 grondo

The main thing I had in mind was to offload all of the work of creating reliable, resumable journal semantics to journald. Less for flux to implement, and perhaps less custom API since journald is well known (if journald really offers sufficient semantics to meet our needs).

Journald allows setting individual log limits, does it not? I'm not suggesting buffering things forever.

If flux is going to make these journals reliable across flux and node restarts, it needs to use up disk space too.

morrone avatar Feb 12 '25 17:02 morrone

AFAIK it only allows an overall limit to be set.

https://www.man7.org/linux/man-pages/man5/journald.conf.5.html

although I'm by no means an expert.

garlick avatar Feb 12 '25 17:02 garlick

It looks like they have "namespaces" (see same man page). So a flux namespace with its own independent limits could be used. It looks like the journalctl command takes a "--namespace" option for reading, but I haven't checked the C API to see what its namespace support looks like.

morrone avatar Feb 12 '25 18:02 morrone

Seems like that's a systemd ~~256~~ 245 feature? (RHEL 8 has systemd 239)

garlick avatar Feb 12 '25 18:02 garlick

Seems like that's a systemd ~256~ 245 feature? (RHEL 8 has systemd 239)

Yeah, that figures.

morrone avatar Feb 13 '25 21:02 morrone