community z/OS SYSLOG support

We've had requests from users for Zowe to support writing relevant info to SYSLOG. This is not as simple as just writing our existing logs there. That's already been attempted by just setting the STDOUT destination to /dev/console Doing that causes Zowe to terminate because the system sends a SIGHUP signal to Zowe. I consulted z/OS experts and it seems the reason for the SIGHUP is SYSLOG has line length limitations around 126 which Zowe easily exceeds in standard logging.

SYSLOG support isn't just "wrap our lines to be under 126 chars". SYSLOG is a central place to view info, so programs on z/OS should not log too much there or it will be challenging to read truly important info.

So the request for SYSLOG support is more like this: Determine what's the most important info about Zowe, and log only that to SYSLOG, separate from our usual logging.

A particular user's feedback was this: They use IBM System Automation to control STCs and look for 4 types of messages from STCs in SYSLOG: ACTIVE (it's starting), UP (it's started and working), TERMINATING (it's going down) and TERMINATED (it's down). In addition, any messages indicating problems that alerts could be automated around would be great.

My wish is to make 3 things:

A SYSLOG logging format for our servers. Alternative but similar to https://github.com/zowe/community/blob/master/Technical-Steering-Committee/best-practices/message-management.md and https://github.com/zowe/zac/issues/90 we need rules that are specific to SYSLOG. We'd need a message format that fits in 126 chars max.
Make Zowe Launcher find those messages in each processes' STDOUT and send them to the SYSLOG via the WTO command https://www.ibm.com/docs/en/zos/2.4.0?topic=wto-write-operator

This way, we wouldn't need our existing servers to have any new ways to write messages, just to write some new messages in a specific format.

Have each server add messages that fit ACTIVE, UP, TERMINATING, TERMINATED. UP in particular is interesting. I've observed APIML prints ZWEAM000I and app-server prints ZWED0031I before they're truly ready. ZWED0031I appears when app-server is ready to do app-server things, but if you try to log into the desktop as soon as you see ZWED0031I, you may be greeted with a response that APIML login failed because the /auth endpoint isnt accepting requests yet. APIML doesn't appear to write any message about when /auth is actually ready, so you wait&retry until successful.

May 03 '22 12:05 1000TurquoisePogs

To be researched: good message ID & format for displaying mesage types such as ACTIVE, UP, TERMINATING, TERMINATED

In my opinion the launcher is the correct place to be writing to syslog because it knows which service sent a message and which user is running Zowe, and putting this syslog writing code in only one place would reduce the burden on everyone.

As for the format, we currently have a format like,

timestamp <jobname:pid and/or thid> userId msgLevel (message file/linenum) messageId messageText

for example, 2019-03-19 11:23:57.776 <ZWEAGW1:threadInformation> userID INFO (locationInformation) ZWEA001I Message text

Due to the length limits of both 3270 and WTO, i think this is far too long. SYSLOG also already prints timestamps, so lets cut out either redundant or too verbose info for the purpose of SYSLOG. How about simply:

userID msgLevel messageId messageText

May 03 '22 14:05 1000TurquoisePogs

I'm uncertain whether the servers even need to print jobname or process ID or if the launcher would add them or if they'd just be omitted. Certainly seems like either launcher or servers are perfectly capable of adding such info.

May 03 '22 16:05 1000TurquoisePogs

Clarification on jobname: It's not required, recipients of the syslog messages get jobname info already.

Jun 22 '22 14:06 1000TurquoisePogs

Previously I said:

Make Zowe Launcher find those messages in each processes' STDOUT and send them to the SYSLOG via the WTO

Can we settle on a format of messageId userID msgLevel messageText

Having the messageId first will make it easy for the launcher to filter by message ID. If there was a ID range or suffix that was unique to WTO messages, it would also help.

Consider variations of ZWEA001I:

Putting L(og) at the end: ZWEA001IL userID INFO Message text
Using L(og) in place of severity: ZWEA001T userID INFO Message text
Using 9xx to signify logging: ZWEA901I userID INFO Message text Please vote with thumb up or down on them if you like them, or post an alternative if you have another idea.

Jun 22 '22 14:06 1000TurquoisePogs

Option 1: Putting L(og) at the end: ZWEA001IL userID INFO Message text

Jun 22 '22 14:06 1000TurquoisePogs

Option 2: Using L(og) in place of severity: ZWEA001L userID INFO Message text This one seems like a bad idea because we lose severity information.

Jun 22 '22 14:06 1000TurquoisePogs

Option 3: Using 9xx to signify logging: ZWEA901I userID INFO Message text We must pick a range that is not currently in use so we dont disrupt.

Jun 22 '22 14:06 1000TurquoisePogs

I prefer option 3 because the message ID fits within 8 characters.

Jun 28 '22 14:06 1000TurquoisePogs

By convention does the last character of the error code not normally signify the severity.

E.g. ZWE0001I for informational ZWE0001E for error ZWE0001W for warning ZWE0001C for critical

Depending on peoples automation, could having an L as the last character be a pain when parsing each message to determine severity.

Jun 29 '22 20:06 Cieronpowell

The L won't be so much a pain for automation, as it will look at the whole message ID and will use the unique number to sort things out. However, it will be a pain for humans acting on the message, as they have no idea what the severity of the message is. This basically forces the person creating automation to interpret the severity of each L-message and write it out again with a new severity indicator.

Jun 30 '22 13:06 OnnoVdT

The L won't be so much a pain for automation, as it will look at the whole message ID and will use the unique number to sort things out. However, it will be a pain for humans acting on the message, as they have no idea what the severity of the message is. This basically forces the person creating automation to interpret the severity of each L-message and write it out again with a new severity indicator.

As an example, I know for scripts I've written in the past, I will parse the last character first and if it's an I, I simply ignore it.

You can't do this for everything, but it tends to do the job in a lot of cases.

Jun 30 '22 16:06 Cieronpowell

Note from a meeting:

Don't use WTL, it's outdated, just use WTO for writing a message
Writing this enhancement in the launcher becomes easier and more maintanable by making the launcher utilize zowe-common-c libraries
Making the launcher accomplish the writing by using the zowe-common-c's logging infrastructure gives us the flexibility to write to a variety of user-configurable destinations over time
This would make a good next PI goal
Line continuation: can be done and should be done in a way that marks the next lines as being part of the first line.

Jul 18 '22 16:07 1000TurquoisePogs

Regarding the line continuation comment. Console lines have a fixed length limitation, which differs slightly for the first and following lines. A longer WTO message will wrap automatically (at fixed location). There are ways to create a multi-line WTO, where the caller decides the wrap location (assuming it is less than or equal to the max length).

Jul 18 '22 16:07 OnnoVdT

What we discussed in the arch call:

Because this is a large task, it might be good to scope out a first version where

messages that go into SYSLOG come from the critical/severe logging levels (or their equivalents) of the components
create the Zowe launcher prototype with 1. and WTO commands

Then after all's done, we could focus more energy and time on what messages should be added/removed. Basically, make the thing work first & then it will be a cross-squad effort (if it wasn't already to create the prototype) to prune the quality of the logging

Jun 27 '23 16:06 DivergentEuropeans

This appears to be complete in v2.11

Sep 08 '23 13:09 1000TurquoisePogs

community community copied to clipboard

z/OS SYSLOG support

community
community copied to clipboard