nomad-driver-podman
nomad-driver-podman copied to clipboard
fix container stats endpoint response handling
Podman 4.1.1 changed the stats
endpoint's HTTP status for stopped containers from 404 to 200.
fix #182
where you able to figure out the error log message? i still do not know where this comes from - if i execute the podman commands from console its fine.
I am not sure this is working. I pulled the PR and built it for testing on my test cluster and it is still leaving containers after jobs have been stopped.
@DemonicTutor I have not looked into the error message specifically but it says
cannot get cgroup path unless container […] is running: container is stopped
which, to me, sounds like Nomad is trying to read the cgroup of a stopped container because, due to #182, it thinks the container is still running when it really is already stopped. So that error might go away with this fix.
I can confirm this resolves this issue for me, RockyLinux 9, podman 4.1.1, nomad 1.3.3
Hello there, I would like to give my feedback too on this PR: I merged it into the main branch at the commit 4efeb99d977c642a8aad85403f9b3f5d05256e1c, and the compiled driver is working as expected with Nomad 1.3.3 and Podman 4.2.0 (Fedora IoT 36.20220822.0).
Edit: I should point out that, if using the "official" 0.4.0 driver with the same environment, if I upload a new job version for a particolar job or if I stop a job, the Nomad client hangs after stopping the pre-existing allocation, until I manually remove the stopped containers with the podman CLI. This does not happen with the driver compiled as I described above.
Hi @dermetfan thanks for the PR! I gave this a try and while the functionality of stopping a container seems to work.
I noticed what looks like a similar side affect of Podman's breaking API change, but we can open another issue for that,
2022-08-31T09:36:59.652-0500 [WARN] client.driver_mgr.nomad-driver-podman: Could not remove container: driver=podman @module=podman container=7968dcd1e3e2d0574251c1ca06b632792a720f46b7c8eb7bff425b8c852befa9 error="cannot delete container, status code: 200" timestamp=2022-08-31T09:36:59.651-0500