podman
podman copied to clipboard
Fix: Do not emit health_status event for each health check attempt.
Emit health_status event only if there is a change in health_status
Fixes #24003
Does this PR introduce a user-facing change?
Emit `health_status` event only when there a status/state change for `healthy` flag
Cockpit tests failed for commit 2771216d55d7fec0e86642e774fdc73f59d4a409. @martinpitt, @jelly, @mvollmer please check.
This "breaks" cockpit-podman's testHealthcheckSystem, which assumes the current behavior of getting regular "health check ran and passed" events. I don't have a good gut feeling whether the regular "pings of life" are expected/by design or considered noise. So I'll defer to the review of the podman developers, and if this change is approved, I'll adjust cockpit-podman's tests.
@Luap99
I don't think there is a formal design on how it should work but the way it works today is how the current users will expect it to work.
I have to agree on this because I searched for such a documentation ( https://docs.docker.com/reference/api/engine/version/v1.41/#tag/System/operation/SystemEvents ) and there is no description about exact behavior of those events . They have only provided the list of valid events
The fact that it breaks cockpit testing is a good sign that we have users depending on it. As such I consider this a breaking change that is not suitable for a minor version so this would have to wait for podman 6.0 if we want to do that at all
IMHO, Podman has a huge opportunity as secure docker replacement and in that sense most of the users ( and applications targeting docker eg: Traefik ) will expect it work similar to docker. To address the user base who is looking to switch to Podman, it should be considered as a bug. Please note that, here i am not referring the podman cli tool. I am talking about docker compatible API which podman provides.
Now one thing we should consider if docker doesn't behave this way our docker compat api should not behave this way either. One way would be to a new field to the health_status event that is set when the status changed and then we can filter out the events that did not have this set to make the docker clients work correctly at least.
Exactly. This is the point I was trying to convey. I checked how podman-docker works but it is found to be a simple bash wrapper script for podman cli ( at-least in my Fedora-40 ) .
So, If my above understanding is correct ( ie, podman-docker is just an alias/wrapper of podman ) then, podman-docker is not exposing a separate docker compatible API ( either via unix socket or via TCP ) .
All we have is an API exposed by podman and it is expected to be docker compatible.
So, in short, my bug report is not about behavior of podma-cli tool. it is about the docker-compatible API which podman exposes.
if any of you can provide some hints on how to fix this issue without changing behaviour podman cli I am happy to try that. To be specific, my question is , at which point, Golang event is converted to HTTP API data?
I will try to add the additional flag as you mentioned in the comment
IMO, this should not be default. I would not mind adding a config field in containers.conf to enable this more minimal events output but we should not (and, as @Luap99 pointed out, cannot without a major version) do this by default.
@harish2704 Our API socket is split in two parts the normal docker api endpoint and the our libpod endpoints that all start with /version/libpod/... so all the other docker compatiable endpoints can and should be changed to match docker api as closely as possible.
Look into pkg/api/handlers/compat/events.go there we use the code for both endpoints but if you look there into the logic you will find !utils.IsLibpodRequest(r) usage there so based on that you can change the behavior there. Now of course because the event stream cannot known when such a healthcheck state change happens you need to add this info into the event itself so that you can filter it there based on that which is why I suggest added this field or attribute to the event type.
Cockpit tests failed for commit 5b8d32d26bc1eebc45d82b40e9866a0459a6a500. @martinpitt, @jelly, @mvollmer please check.
Cockpit tests failed for commit 21b6aa68c0f65c4385fad0941c08fa14e9de8dfb. @martinpitt, @jelly, @mvollmer please check.
@Luap99 I made the changes as you suggested. In short, they are
- Added extra flag to to Event struct to indicate whether health status changed or not
- Marshal & parse that flag while writing to Journald log
- Check that flag before writing/publishing an even in docker compatible API. Other APIs are not touched.
ie, if we run podman events we can see the old normal behavior of too many logs
but if we run env DOCKER_HOST=unix://<path-to-podman.sock> docker logs then we can not see to much logs for health check. Logs are emitted only when there is a state change
Cockpit tests failed for commit 3ec4a744dbd9f95e5667b242f58ec110dc4049bc. @martinpitt, @jelly, @mvollmer please check.
Cockpit tests failed for commit cecdca7c4e581dd02c58dbfca5f4831d261afbe3. @martinpitt, @jelly, @mvollmer please check.
Cockpit tests failed for commit 4597408a4c4eeda07eb96562143921f9c807ec97. @martinpitt, @jelly, @mvollmer please check.
A friendly reminder that this PR had no activity for 30 days.
@harish2704 Do you still want to work on it, or can I take over?
@Honny1 It will get competed quickly if any member of the podman team takeover this . So I would recommend that.
For me, i may need help/guidance from the team for identifying what to do and where to do ( For eg: I got stuck at this https://github.com/containers/podman/pull/24005#discussion_r1780408681 )
FYI, I am actually running this particular patch in a staging ( ie, non critical ) enviroment where I host ~60 containers using Coolify , but I don't have much insights on the codebase of podman project
@harish2704 I will ping them.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: harish2704 Once this PR has been reviewed and has the lgtm label, please assign baude for approval. For more information see the Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
@Luap99 , @Honny1 I have updated this PR against latest main branch. Can you please review the changes ?
Cockpit tests failed for commit 0354fed5c79f56c46a5a4c6525d348de14963b3c. @martinpitt, @jelly, @mvollmer please check.
This "breaks" the cockpit-podman tests in the obvious way -- instead of the expected series of positive health checks, there is now just one. I'm happy to adjust the tests if this gets a positive review from the podman authors.
This "breaks" the cockpit-podman tests in the obvious way -- instead of the expected series of positive health checks, there is now just one. I'm happy to adjust the tests if this gets a positive review from the podman authors.
Are you consuming the compat API or the libpod api event stream? Like I mentioned in https://github.com/containers/podman/pull/24005#issuecomment-2361630358 this requires the use of !utils.IsLibpodRequest(r) so we don't break the libpod API which would be an unacceptable break. The cockpit API test should not have to change unless we do a new major release.
@martinpitt , @Luap99 : To be honest, I really forgot about the !utils.IsLibpodRequest(r) thing while making this PR.
but now I did following test from end .
- After applying this patch, started 3 containers in which health check runs at every second.
- then opened one terminal with
podman eventscommand. I was able to see log output at every second ( 3 output in every second for 3 containers ) - In another terminal and run
docker eventscommand. Docker is actually communicating to podman service over podman.sock. I saw log output only when health status changes.
I hope the above mentioned behavior indicates that this PR is not breaking libpod API. Is that correct ?
( The compose files and other resources I used for testing can be found here https://github.com/containers/podman/issues/24003#issue-2534255215 )
@Luap99 :
This "breaks" the cockpit-podman tests in the obvious way -- instead of the expected series of positive health checks, there is now just one. I'm happy to adjust the tests if this gets a positive review from the podman authors.
Are you consuming the compat API or the libpod api event stream?
current cockpit-podman listens to /v1.12/libpod/events as per https://docs.podman.io/en/stable/_static/api.html#tag/system/operation/SystemEventsLibpod . I'm not sure which of the two that is?
I suppose at some point we should bump the v1.12, but we still support e.g. podman 3.4 in Ubuntu 22.04 LTS, or podman 4.0 in RHEL 8. TBH we haven't paid attention to bumping this, it's mostly "our tests will tell us when v1.12 stops working".
Thanks!
@Luap99 : Sorry. Please ignore my review request. It was made accidentally. I was trying to view the change you requested, I saw a message like "Luap99 Requested changes" but coudn't find what it is
@Luap99 :
This "breaks" the cockpit-podman tests in the obvious way -- instead of the expected series of positive health checks, there is now just one. I'm happy to adjust the tests if this gets a positive review from the podman authors.
Are you consuming the compat API or the libpod api event stream?
current cockpit-podman listens to
/v1.12/libpod/eventsas per https://docs.podman.io/en/stable/_static/api.html#tag/system/operation/SystemEventsLibpod . I'm not sure which of the two that is?
Everything with vX/libpod/ are libpod endpoints so you are using the correct one. The paths that do not start with libpod/ are consider docker compat API we try to match the docker API interface there so if there is a bug that is not right for docker we might decide to chnage that without a major version.
I suppose at some point we should bump the v1.12, but we still support e.g. podman 3.4 in Ubuntu 22.04 LTS, or podman 4.0 in RHEL 8. TBH we haven't paid attention to bumping this, it's mostly "our tests will tell us when v1.12 stops working".
Oh well that version doesn't make any real sense to us. Generally we are tracking the podman version so you should be using something like v5.0.0, the thing is our API versioning is a mess and we seems to not really validate the version at all. In general we have almost no do X only if api version > Y code. There are a few cases but I doubt they would matter to cockpit, in case someone is interested to look at them just grep for SupportedVersion(.
The most obvious case are the network endpoints
$ curl --unix-socket $XDG_RUNTIME_DIR/podman/podman.sock -X GET http://d/v1.12/libpod/networks/json -H "Content Type: application/json"
{"cause":"given version is not supported","message":"failed to parse query parameter 'version': \"1.12.0\": given version is not supported","response":400}
Thanks @Luap99 ! I sent https://github.com/cockpit-project/cockpit-podman/pull/2185 to bump the API version to v3.4.0/ which is clearer indeed, even if it makes little or no practical difference.
@Luap99 @martinpitt : Do I have to do anything from my end on this PR?
@Luap99 @martinpitt : Do I have to do anything from my end on this PR?
You don't have to do much, but there are a few small things to address. The CI should be passing, so let's aim for that. It looks like there are some minor linter nits that need fixing. Also, please make sure to use !utils.IsLibpodRequest(r) where appropriate.
@Luap99 @martinpitt : Do I have to do anything from my end on this PR?
Note @martinpitt is a maintainer of cockpit and not podman, he just comments here if he see changes that break the cockpit-podman tests which also helps us to avoid breaking them with unwanted API changes.
@harish2704 CI is not passing here, see the jobs error. The linter is complaining https://github.com/containers/podman/pull/24005/checks?check_run_id=45370701706 so that must be fixed.
I have not yet reviewed this properly but will try to do so later this week.
Cockpit tests failed for commit 38c8c754adf8ae9f99bc5b90732fae4e091350b8. @martinpitt, @jelly, @mvollmer please check.
Cockpit tests failed for commit 793d1db131193eb8cf9bd9ec267a5e830951eb7f. @martinpitt, @jelly, @mvollmer please check.