firecracker-containerd icon indicating copy to clipboard operation
firecracker-containerd copied to clipboard

Shim Connect method returns VM-internal information

Open sipsma opened this issue 5 years ago • 4 comments

The Connect method served by a v2 runtime service is supposed to provide the Shim's PID and a given Task's PID.

Right now, our host-side runtime shim just forwards the request to the VM Agent, so the results returned are specific to the VM, which is a bit surprising for a caller outside the VM to receive (they can't really do much with a PID that exists inside the VM): https://github.com/firecracker-microvm/firecracker-containerd/blob/c0abc8b8ec0d6045f0ed72b2513e9dea13d213c7/runtime/service.go#L938-L950

At the time of this writing, the Connect API doesn't appear to be exposed to containerd clients; it does however appear to be used by containerd internally in order to reconnect to a shim after the containerd service restarts.

There needs to be more investigation to determine the proper fix here. I suspect it would make sense to at least return the host-side Shim PID instead of the VM Agent PID, however it's less immediately clear what we should return as the Task PID since there is not a host-side PID for the task. Depending on how containerd ends up actually using that Task PID it may or may not make sense to just return the PID of the VM as the Task PID.

sipsma avatar Jun 13 '19 21:06 sipsma

Yeah. Not so sure what people want to do with the PIDs. The .proto file doesn't explain the semantics much.

https://github.com/containerd/containerd/blob/faec567304bbdf6864b1663d4f813641b5880a4a/runtime/v2/task/shim.proto#L168

kzys avatar Jul 17 '19 18:07 kzys

Yeah. Not so sure what people want to do with the PIDs. The .proto file doesn't explain the semantics much.

https://github.com/containerd/containerd/blob/faec567304bbdf6864b1663d4f813641b5880a4a/runtime/v2/task/shim.proto#L168

Yeah, just going by its use in their code, containerd internally appears to use it to check if a shim is still alive after the containerd daemon restarts.

If they exposed it to clients in the future (I don't think it is currently) I'd suspect it could be used for monitoring the process so they can know directly if something goes wrong and the shim exits unexpectedly or something like that.

I'm now also wondering if in the short term we should replace the implementation of this method with one that returns a ErrNotImplemented, which the containerd runtimev2 docs suggest is the right thing to do. Once we figure out an actual solution to this problem, we can replace it with a different implementation if needed.

sipsma avatar Jul 17 '19 18:07 sipsma

Now we return ErrNotImplemented instead.

I'd like keep this issue open until we figure out a long-term solution (or decide that returning the error is the long-term solution).

kzys avatar Jul 26 '19 22:07 kzys

It's probably the time to formally address this issue now for containerd v1.6.0 release.

The task creation logic here needs a PID from shim. And the new logic in shim layer will call Connect API to fetch the PID, as opposed to the old way.

This essentially breaks the entire task creation process in fc-cd.

The error is:

# docker run --rm -it --privileged --ipc=host --volume /dev:/dev --volume /run/udev/control:/run/udev/control --volume /home/ec2-user/devel/my-fccd/examples/etc/containerd/firecracker-runtime.json:/etc/containerd/firecracker-runtime.json --volume /home/ec2-user/devel/my-fccd/examples/logs:/var/log/firecracker-containerd-test --volume /home/ec2-user/devel/my-fccd/examples/..:/src --env FICD_DM_VOLUME_GROUP= --env FICD_DM_POOL=testpool --env EXTRAGOARGS="" --workdir="/src/examples" localhost/firecracker-containerd-test:latest "make examples && make testtap && sleep 3 && ./taskworkflow -ip 172.16.0.2/24 -gw 172.16.0.1 -ss devmapper"
make: Nothing to be done for 'examples'.
ip link add br0 type bridge
ip tuntap add tap0 mode tap
ip link set tap0 master br0
ip link set dev tap0 up
ip link set dev br0 up
ip addr add dev br0 172.16.0.1/24
2022/02/25 04:23:15.189142 Creating containerd client
2022/02/25 04:23:15.189855 Created containerd client
2022/02/25 04:23:19.634736 Successfully pulled docker.io/library/nginx:1.17-alpine image with devmapper
2022/02/25 04:23:30.035466 failed to stop VM, err: rpc error: code = Internal desc = the VMM was killed forcibly: 1 error occurred:
	* signal: terminated


2022/02/25 04:23:30.035517 creating task: failed to get task pid: not implemented: unknown

henry118 avatar Feb 25 '22 18:02 henry118