community
community copied to clipboard
z/OS SYSLOG support
We've had requests from users for Zowe to support writing relevant info to SYSLOG. This is not as simple as just writing our existing logs there. That's already been attempted by just setting the STDOUT destination to /dev/console Doing that causes Zowe to terminate because the system sends a SIGHUP signal to Zowe. I consulted z/OS experts and it seems the reason for the SIGHUP is SYSLOG has line length limitations around 126 which Zowe easily exceeds in standard logging.
SYSLOG support isn't just "wrap our lines to be under 126 chars". SYSLOG is a central place to view info, so programs on z/OS should not log too much there or it will be challenging to read truly important info.
So the request for SYSLOG support is more like this: Determine what's the most important info about Zowe, and log only that to SYSLOG, separate from our usual logging.
A particular user's feedback was this: They use IBM System Automation to control STCs and look for 4 types of messages from STCs in SYSLOG: ACTIVE (it's starting), UP (it's started and working), TERMINATING (it's going down) and TERMINATED (it's down). In addition, any messages indicating problems that alerts could be automated around would be great.
My wish is to make 3 things:
-
A SYSLOG logging format for our servers. Alternative but similar to https://github.com/zowe/community/blob/master/Technical-Steering-Committee/best-practices/message-management.md and https://github.com/zowe/zac/issues/90 we need rules that are specific to SYSLOG. We'd need a message format that fits in 126 chars max.
-
Make Zowe Launcher find those messages in each processes' STDOUT and send them to the SYSLOG via the WTO command https://www.ibm.com/docs/en/zos/2.4.0?topic=wto-write-operator
This way, we wouldn't need our existing servers to have any new ways to write messages, just to write some new messages in a specific format.
- Have each server add messages that fit ACTIVE, UP, TERMINATING, TERMINATED. UP in particular is interesting. I've observed APIML prints ZWEAM000I and app-server prints ZWED0031I before they're truly ready. ZWED0031I appears when app-server is ready to do app-server things, but if you try to log into the desktop as soon as you see ZWED0031I, you may be greeted with a response that APIML login failed because the
/authendpoint isnt accepting requests yet. APIML doesn't appear to write any message about when/authis actually ready, so you wait&retry until successful.
To be researched: good message ID & format for displaying mesage types such as ACTIVE, UP, TERMINATING, TERMINATED
In my opinion the launcher is the correct place to be writing to syslog because it knows which service sent a message and which user is running Zowe, and putting this syslog writing code in only one place would reduce the burden on everyone.
As for the format, we currently have a format like,
timestamp <jobname:pid and/or thid> userId msgLevel (message file/linenum) messageId messageText
for example,
2019-03-19 11:23:57.776 <ZWEAGW1:threadInformation> userID INFO (locationInformation) ZWEA001I Message text
Due to the length limits of both 3270 and WTO, i think this is far too long. SYSLOG also already prints timestamps, so lets cut out either redundant or too verbose info for the purpose of SYSLOG. How about simply:
userID msgLevel messageId messageText
I'm uncertain whether the servers even need to print jobname or process ID or if the launcher would add them or if they'd just be omitted. Certainly seems like either launcher or servers are perfectly capable of adding such info.
Clarification on jobname: It's not required, recipients of the syslog messages get jobname info already.
Previously I said:
Make Zowe Launcher find those messages in each processes' STDOUT and send them to the SYSLOG via the WTO
Can we settle on a format of
messageId userID msgLevel messageText
Having the messageId first will make it easy for the launcher to filter by message ID. If there was a ID range or suffix that was unique to WTO messages, it would also help.
Consider variations of ZWEA001I:
- Putting L(og) at the end:
ZWEA001IL userID INFO Message text - Using L(og) in place of severity:
ZWEA001T userID INFO Message text - Using 9xx to signify logging:
ZWEA901I userID INFO Message textPlease vote with thumb up or down on them if you like them, or post an alternative if you have another idea.
Option 1: Putting L(og) at the end: ZWEA001IL userID INFO Message text
Option 2: Using L(og) in place of severity: ZWEA001L userID INFO Message text
This one seems like a bad idea because we lose severity information.
Option 3: Using 9xx to signify logging: ZWEA901I userID INFO Message text
We must pick a range that is not currently in use so we dont disrupt.
I prefer option 3 because the message ID fits within 8 characters.
By convention does the last character of the error code not normally signify the severity.
E.g. ZWE0001I for informational ZWE0001E for error ZWE0001W for warning ZWE0001C for critical
Depending on peoples automation, could having an L as the last character be a pain when parsing each message to determine severity.
The L won't be so much a pain for automation, as it will look at the whole message ID and will use the unique number to sort things out. However, it will be a pain for humans acting on the message, as they have no idea what the severity of the message is. This basically forces the person creating automation to interpret the severity of each L-message and write it out again with a new severity indicator.
The L won't be so much a pain for automation, as it will look at the whole message ID and will use the unique number to sort things out. However, it will be a pain for humans acting on the message, as they have no idea what the severity of the message is. This basically forces the person creating automation to interpret the severity of each L-message and write it out again with a new severity indicator.
As an example, I know for scripts I've written in the past, I will parse the last character first and if it's an I, I simply ignore it.
You can't do this for everything, but it tends to do the job in a lot of cases.
Note from a meeting:
- Don't use WTL, it's outdated, just use WTO for writing a message
- Writing this enhancement in the launcher becomes easier and more maintanable by making the launcher utilize zowe-common-c libraries
- Making the launcher accomplish the writing by using the zowe-common-c's logging infrastructure gives us the flexibility to write to a variety of user-configurable destinations over time
- This would make a good next PI goal
- Line continuation: can be done and should be done in a way that marks the next lines as being part of the first line.
Regarding the line continuation comment. Console lines have a fixed length limitation, which differs slightly for the first and following lines. A longer WTO message will wrap automatically (at fixed location). There are ways to create a multi-line WTO, where the caller decides the wrap location (assuming it is less than or equal to the max length).
What we discussed in the arch call:
Because this is a large task, it might be good to scope out a first version where
- messages that go into SYSLOG come from the critical/severe logging levels (or their equivalents) of the components
- create the Zowe launcher prototype with 1. and WTO commands
Then after all's done, we could focus more energy and time on what messages should be added/removed. Basically, make the thing work first & then it will be a cross-squad effort (if it wasn't already to create the prototype) to prune the quality of the logging
This appears to be complete in v2.11