jetstack-secure icon indicating copy to clipboard operation
jetstack-secure copied to clipboard

[VC-35738] Log with klog to stdout and stderr in JSON format

Open wallrj opened this issue 1 year ago • 4 comments

I've settled on using klog and kubernetes component-base module for the reasons given in the comments in the logs.go file. I think this logging UX is most likely to be familiar to Kubernetes platform admins who are likely to be deploying and debugging the venafi-kubernetes-agent. I gave up on Slog because it seemed to have unfamiliar severity / levels and I couldn't get the klog client-go logs to have the right levels with slog. Using klog allows us to debug problems in client-go. For example, you can view the API requests and responses using --vmodule=round_tripper=6.

In followup PRs I will convert the existing log lines to use context derived logr loggers.

$ go test ./pkg/logs/... -v -run TestLogs -count 1
=== RUN   TestLogs
=== RUN   TestLogs/help
    logs_test.go:134: FLAGS
        -h
    logs_test.go:136: STDOUT
        pflag: help requested

    logs_test.go:137: STDERR
        Usage of test-logs:
              --logging-format string   Sets the log format. Permitted formats: "json", "text". (default "json")
          -v, --v Level                 number for the log level verbosity
              --vmodule pattern=N,...   comma-separated list of pattern=N settings for file-filtered logging (only works for text log format)

    logs_test.go:142: ERROR: exit status 2
=== RUN   TestLogs/unrecognized-flag
    logs_test.go:134: FLAGS
        --foo
    logs_test.go:136: STDOUT
        unknown flag: --foo

    logs_test.go:137: STDERR
        unknown flag: --foo
        Usage of test-logs:
              --logging-format string   Sets the log format. Permitted formats: "json", "text". (default "json")
          -v, --v Level                 number for the log level verbosity
              --vmodule pattern=N,...   comma-separated list of pattern=N settings for file-filtered logging (only works for text log format)

    logs_test.go:142: ERROR: exit status 2
=== RUN   TestLogs/logging-format-unrecognized
    logs_test.go:134: FLAGS
        --logging-format=foo
    logs_test.go:136: STDOUT

    logs_test.go:137: STDERR
        Error in logging configuration: format: Invalid value: "foo": Unsupported log format

    logs_test.go:142: ERROR: exit status 2
=== RUN   TestLogs/original-defaults
    logs_test.go:134: FLAGS

    logs_test.go:136: STDOUT

    logs_test.go:137: STDERR
        vCert: 2024/10/17 08:10:12 log Print
        vCert: 2024/10/17 08:10:12 INFO slog Info
        vCert: 2024/10/17 08:10:12 WARN slog Warn
        vCert: 2024/10/17 08:10:12 ERROR slog Error
        I1017 08:10:12.337242   29346 logs_test.go:46] klog Info
        W1017 08:10:12.337247   29346 logs_test.go:47] klog Warning
        E1017 08:10:12.337254   29346 logs_test.go:48] "klog Error" err="fake-error"
        I1017 08:10:12.337259   29346 logs_test.go:49] "klog InfoS" key="value"
        E1017 08:10:12.337268   29346 logs_test.go:52] "Contextual error" err="fake-error" logger="foo" key="value"

=== RUN   TestLogs/modified-defaults
    logs_test.go:134: FLAGS

    logs_test.go:136: STDOUT
        {"ts":1729149012343.6404,"caller":"log/log.go:245","msg":"log Print","v":0}
        {"ts":1729149012343.672,"caller":"logs/logs_test.go:43","msg":"slog Info","v":0}
        {"ts":1729149012343.6807,"caller":"logs/logs_test.go:44","msg":"slog Warn","v":0}
        {"ts":1729149012343.7136,"caller":"logs/logs_test.go:46","msg":"klog Info","v":0}
        {"ts":1729149012343.7205,"caller":"logs/logs_test.go:47","msg":"klog Warning","v":0}
        {"ts":1729149012343.7432,"caller":"logs/logs_test.go:49","msg":"klog InfoS","v":0,"key":"value"}

    logs_test.go:137: STDERR
        {"ts":1729149012343.6902,"caller":"logs/logs_test.go:45","msg":"slog Error"}
        {"ts":1729149012343.7363,"caller":"logs/logs_test.go:48","msg":"klog Error","err":"fake-error"}
        {"ts":1729149012343.759,"logger":"foo","caller":"logs/logs_test.go:52","msg":"Contextual error","key":"value","err":"fake-error"}

=== RUN   TestLogs/logging-format-json
    logs_test.go:134: FLAGS
        --logging-format=json
    logs_test.go:136: STDOUT
        {"ts":1729149012349.122,"caller":"log/log.go:245","msg":"log Print","v":0}
        {"ts":1729149012349.1545,"caller":"logs/logs_test.go:43","msg":"slog Info","v":0}
        {"ts":1729149012349.1628,"caller":"logs/logs_test.go:44","msg":"slog Warn","v":0}
        {"ts":1729149012349.1802,"caller":"logs/logs_test.go:46","msg":"klog Info","v":0}
        {"ts":1729149012349.189,"caller":"logs/logs_test.go:47","msg":"klog Warning","v":0}
        {"ts":1729149012349.2192,"caller":"logs/logs_test.go:49","msg":"klog InfoS","v":0,"key":"value"}

    logs_test.go:137: STDERR
        {"ts":1729149012349.1685,"caller":"logs/logs_test.go:45","msg":"slog Error"}
        {"ts":1729149012349.206,"caller":"logs/logs_test.go:48","msg":"klog Error","err":"fake-error"}
        {"ts":1729149012349.236,"logger":"foo","caller":"logs/logs_test.go:52","msg":"Contextual error","key":"value","err":"fake-error"}

=== RUN   TestLogs/log-json-split-stream-false
    logs_test.go:134: FLAGS
        --logging-format=json --log-json-split-stream=false
    logs_test.go:136: STDOUT

    logs_test.go:137: STDERR
        {"ts":1729149012355.1396,"caller":"log/log.go:245","msg":"log Print","v":0}
        {"ts":1729149012355.1697,"caller":"logs/logs_test.go:43","msg":"slog Info","v":0}
        {"ts":1729149012355.177,"caller":"logs/logs_test.go:44","msg":"slog Warn","v":0}
        {"ts":1729149012355.183,"caller":"logs/logs_test.go:45","msg":"slog Error"}
        {"ts":1729149012355.1924,"caller":"logs/logs_test.go:46","msg":"klog Info","v":0}
        {"ts":1729149012355.1992,"caller":"logs/logs_test.go:47","msg":"klog Warning","v":0}
        {"ts":1729149012355.203,"caller":"logs/logs_test.go:48","msg":"klog Error","err":"fake-error"}
        {"ts":1729149012355.2168,"caller":"logs/logs_test.go:49","msg":"klog InfoS","v":0,"key":"value"}
        {"ts":1729149012355.2212,"logger":"foo","caller":"logs/logs_test.go:52","msg":"Contextual error","key":"value","err":"fake-error"}

=== RUN   TestLogs/logging-format-text
    logs_test.go:134: FLAGS
        --logging-format=text
    logs_test.go:136: STDOUT
        I1017 08:10:12.360983   29371 log.go:245] log Print
        I1017 08:10:12.361035   29371 logs_test.go:43] "slog Info"
        I1017 08:10:12.361060   29371 logs_test.go:46] klog Info
        I1017 08:10:12.361079   29371 logs_test.go:49] "klog InfoS" key="value"

    logs_test.go:137: STDERR
        W1017 08:10:12.361052   29371 logs_test.go:44] "slog Warn"
        E1017 08:10:12.361056   29371 logs_test.go:45] "slog Error"
        W1017 08:10:12.361063   29371 logs_test.go:47] klog Warning
        E1017 08:10:12.361073   29371 logs_test.go:48] "klog Error" err="fake-error"
        E1017 08:10:12.361083   29371 logs_test.go:52] "Contextual error" err="fake-error" logger="foo" key="value"

=== RUN   TestLogs/log-text-split-stream-false
    logs_test.go:134: FLAGS
        --logging-format=text --log-text-split-stream=false
    logs_test.go:136: STDOUT

    logs_test.go:137: STDERR
        I1017 08:10:12.367297   29377 log.go:245] log Print
        I1017 08:10:12.367352   29377 logs_test.go:43] "slog Info"
        W1017 08:10:12.367358   29377 logs_test.go:44] "slog Warn"
        E1017 08:10:12.367361   29377 logs_test.go:45] "slog Error"
        I1017 08:10:12.367364   29377 logs_test.go:46] klog Info
        W1017 08:10:12.367367   29377 logs_test.go:47] klog Warning
        E1017 08:10:12.367382   29377 logs_test.go:48] "klog Error" err="fake-error"
        I1017 08:10:12.367402   29377 logs_test.go:49] "klog InfoS" key="value"
        E1017 08:10:12.367406   29377 logs_test.go:52] "Contextual error" err="fake-error" logger="foo" key="value"

=== RUN   TestLogs/v-level-3
    logs_test.go:134: FLAGS
        --v=3
    logs_test.go:136: STDOUT
        {"ts":1729149012373.1528,"caller":"log/log.go:245","msg":"log Print","v":0}
        {"ts":1729149012373.1838,"caller":"logs/logs_test.go:43","msg":"slog Info","v":0}
        {"ts":1729149012373.1895,"caller":"logs/logs_test.go:44","msg":"slog Warn","v":0}
        {"ts":1729149012373.2012,"caller":"logs/logs_test.go:46","msg":"klog Info","v":0}
        {"ts":1729149012373.207,"caller":"logs/logs_test.go:47","msg":"klog Warning","v":0}
        {"ts":1729149012373.217,"caller":"logs/logs_test.go:49","msg":"klog InfoS","v":0,"key":"value"}
        {"ts":1729149012373.2212,"logger":"foo","caller":"logs/logs_test.go:51","msg":"Contextual Info Level 3","v":3,"key":"value"}

    logs_test.go:137: STDERR
        {"ts":1729149012373.194,"caller":"logs/logs_test.go:45","msg":"slog Error"}
        {"ts":1729149012373.2112,"caller":"logs/logs_test.go:48","msg":"klog Error","err":"fake-error"}
        {"ts":1729149012373.225,"logger":"foo","caller":"logs/logs_test.go:52","msg":"Contextual error","key":"value","err":"fake-error"}

=== RUN   TestLogs/vmodule-level-3
    logs_test.go:134: FLAGS
        --logging-format=text --vmodule=logs_test=3
    logs_test.go:136: STDOUT
        I1017 08:10:12.379068   29389 log.go:245] log Print
        I1017 08:10:12.379865   29389 logs_test.go:43] "slog Info"
        I1017 08:10:12.379892   29389 logs_test.go:46] klog Info
        I1017 08:10:12.379906   29389 logs_test.go:49] "klog InfoS" key="value"
        I1017 08:10:12.379911   29389 logs_test.go:51] "Contextual Info Level 3" logger="foo" key="value"

    logs_test.go:137: STDERR
        W1017 08:10:12.379884   29389 logs_test.go:44] "slog Warn"
        E1017 08:10:12.379888   29389 logs_test.go:45] "slog Error"
        W1017 08:10:12.379894   29389 logs_test.go:47] klog Warning
        E1017 08:10:12.379900   29389 logs_test.go:48] "klog Error" err="fake-error"
        E1017 08:10:12.379915   29389 logs_test.go:52] "Contextual error" err="fake-error" logger="foo" key="value"

--- PASS: TestLogs (0.07s)
    --- PASS: TestLogs/help (0.01s)
    --- PASS: TestLogs/unrecognized-flag (0.01s)
    --- PASS: TestLogs/logging-format-unrecognized (0.01s)
    --- PASS: TestLogs/original-defaults (0.01s)
    --- PASS: TestLogs/modified-defaults (0.01s)
    --- PASS: TestLogs/logging-format-json (0.01s)
    --- PASS: TestLogs/log-json-split-stream-false (0.01s)
    --- PASS: TestLogs/logging-format-text (0.01s)
    --- PASS: TestLogs/log-text-split-stream-false (0.01s)
    --- PASS: TestLogs/v-level-3 (0.01s)
    --- PASS: TestLogs/vmodule-level-3 (0.01s)
PASS
ok      github.com/jetstack/preflight/pkg/logs  0.077s
$ ./hack/e2e/test.sh
...
{
  "ts": 1729108594511.0774,
  "caller": "log/log.go:245",
  "msg": "successfully gathered 30 items from \"k8s/replicasets\" datagatherer",
  "v": 0
}
{
  "ts": 1729108594511.1047,
  "caller": "log/log.go:245",
  "msg": "Posting data to: ",
  "v": 0
}
{"ts":1729108595121.0984,"caller":"log/log.go:245","msg":"Data sent successfully.","v":0}

image

wallrj avatar Oct 16 '24 22:10 wallrj

Hi, does cert-manager and the other components use klog? Do they use the conventions that this MR is adopting?

james-w avatar Oct 17 '24 14:10 james-w

Hey Richard, great job with adding unit tests around the logs. I really appreciate the effort. Thanks to you, I can just run go test -v to check what the various logging outputs are is great. Well done!

Does cert-manager and the other components use klog? Do they use the conventions that this MR is adopting?

To the question of consistency:

  • The library convention is klog for older projects such as cert-manager.
  • For newer projects, the convention is to use klogr as we prefer using structured logging
  • The default format convention seems to be klog's text format.

For that reason, the choice of klogr would make more sense in the Venafi Kubernetes Agent as it would bring structured logging. My first suggestion would be for us to start with klogr since klogr and klog don't have the same flags, and that would reduce the chance for people realizing they can't use --v anymore. We don't need to re-structure the logs right away, we can do that later (if ever).

Note that I found in https://github.com/kubernetes/klog/commit/44eadc3e71f44b0c1d05d418f56b9ad2a26dbb22 that slog is preferred over klogr since last year:

The klogr package doesn't get updated. It shouldn't be used anymore. To make that clearer, it gets marked as "deprecated".

My second suggestion, based on the fact that klogr is being phased out, would be to use std lib's log/slog and use klog as the backend. WDYT @wallrj?

Full picture:

project lib flag values default** supports json format
cert-manager klog -v, --v 0 to 9 info, 2 yes with --logging-format=json, default to klog's text format
cert-manager webhook klog -v, --v 0 to 9 info, 2 yes with --logging-format=json, default to klog's text format
openshift-routes klogr -v, --log-level 0 to 9 info, 1 no, uses klog's text format
csi-driver klogr -v, --log-level 0 to 9 info, 1 no, uses klog's text format
trust-manager klogr -v, --log-level 0 to 9 info, 1 yes with --log-format=json, defaults to klog's text format
approver-policy klogr -v, --log-level 0 to 9 info, 1 yes with --log-format=json, defaults to klog's text format
venafi-enhanced-issuer zap -zap-log-level debug, info, error, 0 to 9 info yes with --zap-encoder=json, defaults to zap's text format
Agent (before this PR) log + klog* none no, uses log's and klogs' text formats
Agent (with this PR) klog -v, --v 0 to 9 ? yes by default, can use klog's text format using --logging-format=text
venctl slog*** --log-level debug, info, warn, error info yes with --log-format=json, defaults to slog's text format
vcert log --verbose true, false false no, uses log's text format

* log is the standard library's log package. It is used for the agent's code, while klog is used by the client-go library that the agent uses. ** For example, "info, 4" means that the default severity is "info" and the default level for the "info" severity is 4. The level is always for the "info" level. *** "slog" and "log" belongs to the standard library (log/slog).

maelvls avatar Oct 18 '24 07:10 maelvls

Hi, yep, consistency was exactly my concern, thanks for the detailed response. In my opinion it's favourable to have a smaller set to make updates easier down the line, even if that smaller set doesn't include the current best practice yet. If this project were to adopt the conventions of the majority of the other projects then we would have more consistency, and could figure out how/when to migrate them en-masse. Alternatively there may be an argument to update to the latest/greatest now, it depends on the details.

It sounds like you think that log/slog would be preferable to klog at this moment in this project?

james-w avatar Oct 18 '24 09:10 james-w

@wallrj I've reviewed your PR and made some small adjustments:

  • In d9afd9508612005ca77ebfd556538c1076e59da4, I moved everything to slog (it was pretty easy). We can later migrate every logs.Log.Printf with slog.Info.

  • In 33f5a12756a22395a02fb8fdf9c22702fb2549f5, I added assertions to the stderr and stdout outputs of each test case so that we can track the changes I have made when I moved to slog.

  • In d9afd9508612005ca77ebfd556538c1076e59da4, I've added a "source" to indicate whether a log line originates from VCert or from the agent itself.

  • In 4428a242dcc630ec93f2c594fe2350d07483c0e1, I've made sure the logs.Log.Fatalf logs are showing with the ERROR severity in slog.

  • In 6752d49, I removed --v and replaced it with --log-level to match the long form of -v that most of our components uses (approver-policy, trust-manager, openshift-routes, and csi-driver). The help now looks like this:

    +  -v, --log-level Level         number for the log level verbosity
    -  -v, --v Level                 number for the log level verbosity
    

Regarding the flags:

It seems we (the cert-manager project) have initiated the convention of the flags -v and --log-level. It has nothing to do with klogr; klogr itself doesn't have a convention around the flag names. Example: https://github.com/cert-manager/openshift-routes/blob/0da68aed69ef66c8c9cefde45d855410813a6b3c/internal/cmd/app/options/options.go#L114-L116

I propose that we ~only expose the -v flag~ expose --log-level instead of --v since that's the long form flag that is the most common across our components (approver-policy, trust-manager, openshift-routes, and csi-driver).

Regarding the slog severity problem:

I gave up on Slog because it seemed to have unfamiliar severity / levels and I couldn't get the klog client-go logs to have the right levels with slog.

I've investigated and was able to configure an slog logger in d9afd9508612005ca77ebfd556538c1076e59da4. To avoid lots of unrelated changes, the agent logs are still using the standard library's *log.Logger.

Regarding the output format:

I found that none of our projects currently output JSON by default. You mentioned the Kubernetes logging conventions, which say that the JSON format must be enabled with --logging-format, which means that if not passed, the output shouldn't be JSON:

Requires passing --logging-format=json to enable.

@wallrj IMO we should set the default output to "text"... WDYT? I haven't reverted to "text". I'll let you decide.

@wallrj Feel free to merge this PR once you are happy with it. I'll approve it now.

maelvls avatar Oct 18 '24 14:10 maelvls

@wallrj IMO we should set the default output to "text"... WDYT? I haven't reverted to "text". I'll let you decide.

I agree that text format by default is more consistent with other TLSPK components. Done.

@wallrj Feel free to merge this PR once you are happy with it. I'll approve it now.

I'll merge this after the 1.2.0 release because I'd like to finish the other logging improvements before releasing to customers.

wallrj avatar Oct 25 '24 16:10 wallrj