nomad-driver-podman icon indicating copy to clipboard operation
nomad-driver-podman copied to clipboard

fix container stats endpoint response handling

Open dermetfan opened this issue 2 years ago • 4 comments

Podman 4.1.1 changed the stats endpoint's HTTP status for stopped containers from 404 to 200.

fix #182

dermetfan avatar Jul 21 '22 11:07 dermetfan

CLA assistant check
All committers have signed the CLA.

hashicorp-cla avatar Jul 21 '22 11:07 hashicorp-cla

where you able to figure out the error log message? i still do not know where this comes from - if i execute the podman commands from console its fine.

DemonicTutor avatar Jul 21 '22 14:07 DemonicTutor

I am not sure this is working. I pulled the PR and built it for testing on my test cluster and it is still leaving containers after jobs have been stopped.

jdoss avatar Jul 21 '22 23:07 jdoss

@DemonicTutor I have not looked into the error message specifically but it says

cannot get cgroup path unless container […] is running: container is stopped

which, to me, sounds like Nomad is trying to read the cgroup of a stopped container because, due to #182, it thinks the container is still running when it really is already stopped. So that error might go away with this fix.

dermetfan avatar Jul 22 '22 18:07 dermetfan

I can confirm this resolves this issue for me, RockyLinux 9, podman 4.1.1, nomad 1.3.3

zandeez avatar Aug 19 '22 08:08 zandeez

Hello there, I would like to give my feedback too on this PR: I merged it into the main branch at the commit 4efeb99d977c642a8aad85403f9b3f5d05256e1c, and the compiled driver is working as expected with Nomad 1.3.3 and Podman 4.2.0 (Fedora IoT 36.20220822.0).

Edit: I should point out that, if using the "official" 0.4.0 driver with the same environment, if I upload a new job version for a particolar job or if I stop a job, the Nomad client hangs after stopping the pre-existing allocation, until I manually remove the stopped containers with the podman CLI. This does not happen with the driver compiled as I described above.

Procsiab avatar Aug 24 '22 07:08 Procsiab

Hi @dermetfan thanks for the PR! I gave this a try and while the functionality of stopping a container seems to work.

I noticed what looks like a similar side affect of Podman's breaking API change, but we can open another issue for that,

2022-08-31T09:36:59.652-0500 [WARN]  client.driver_mgr.nomad-driver-podman: Could not remove container: driver=podman @module=podman container=7968dcd1e3e2d0574251c1ca06b632792a720f46b7c8eb7bff425b8c852befa9 error="cannot delete container, status code: 200" timestamp=2022-08-31T09:36:59.651-0500

shoenig avatar Aug 31 '22 14:08 shoenig