podman icon indicating copy to clipboard operation
podman copied to clipboard

HealthCheck log output options

Open Honny1 opened this issue 1 year ago • 8 comments

This PR creates three new flags that can affect the output of the HealtCheck log.

Currently, when a container is configured with HealthCheck, the output from the HealthCheck command is only logged to the container status file, which is accessible via podman inspect. It is also limited to the last five executions and the first 500 characters per execution.

This makes debugging past problems very difficult, since the only information available about the failure of the HealthCheck command is the generic healthcheck service failed record.

  • The --health-log-destination flag sets the destination of the HealthCheck log.

    • none: (default behavior) HealthCheckResults are stored in overlay containers. (For example: ./run/containers/storage/overlay-containers/<container-ID>/healthcheck.log)
    • directory: creates a log file named <container-ID>-healthcheck.log with JSON HealthCheckResults in the specified directory.
    • events_logger: The log will be written with logging mechanism set by events_logger.
  • The --health-max-log-count flag sets the maximum number of attempts in the HealthCheck log file.

    • A value of 0 indicates an infinite number of attempts in the log file.
    • The default value is 5 attempts in the log file.
  • The --health-max-log-size flag sets the maximum length of the log stored.

    • A value of 0 indicates an infinite log length.
    • The default value is 500 log characters.

Does this PR introduce a user-facing change?

Added --health-log-destination, --health-max-log-count and --health-max-log-size flags that affect HealtCheck log output.

Fixes: RHEL-24623

Honny1 avatar Sep 09 '24 13:09 Honny1

@mheon PTAL

Honny1 avatar Sep 19 '24 13:09 Honny1

@Luap99 PTAL, particularly at the events bits. I don't really mind but we're getting a lot of feedback about how we're handling events.

mheon avatar Sep 19 '24 14:09 mheon

@Luap99 PTAL

Honny1 avatar Sep 22 '24 20:09 Honny1

@mheon @Luap99 PTAL, I have resolved the feedback and checked that the defaults are propagated correctly.

Honny1 avatar Sep 24 '24 11:09 Honny1

@edsantiago @Luap99 PTAL, I've modified the code according to your feedback.

Honny1 avatar Sep 24 '24 19:09 Honny1

Looking, but, test flakes rootless on my laptop:

✗ |220| podman healthcheck --health-max-log-size infinite value (0) [3485]
...
...very very very very long string
#/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
#|     FAIL: Number of matching health log messages
#| expected: -eq 2
#|   actual:     1
#\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

edsantiago avatar Sep 24 '24 19:09 edsantiago

This is caused by the fact that podman run creates and starts the systemd timer right away, which starts the first run of HealtCheck when the container is created. Then the podman healthcheck run is manually triggered again in the test. That's why you can get a second run. However, this depends on systemd. I can avoid this by using the -ge comparison. WDYT? @edsantiago

Honny1 avatar Sep 24 '24 19:09 Honny1

@edsantiago PTAL, I've modified the tests according to your suggestions.

Honny1 avatar Sep 25 '24 12:09 Honny1

Ephemeral COPR build failed. @containers/packit-build please check.

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, Honny1, Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • ~~OWNERS~~ [Luap99,edsantiago]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Sep 26 '24 11:09 openshift-ci[bot]