Documentation: Protocol clarification
I implemented an NRI plugin in rust and I noticed that the documentation is not quite clear on the underlying protocol. It currently states:
The core of NRI is defined by a protobuf protocol definition of the low-level plugin API. The API defines two services, Runtime and Plugin.
Unfortunately that is really only half the story. Firstly it would be nice to mention that it is using ttrpc instead of grpc, but more importantly it should be stated that there is the multiplexer layer underneath it, with an implicit 1 meaning PluginServer, and 2 being RuntimeClient.
The last issue I encountered is that there appears to be no indication which Events can be subscribed to, but simply selecting all Events may result in the plugin registration failing.
The last issue I encountered is that there appears to be no indication which Events can be subscribed to, but simply selecting all Events may result in the plugin registration failing.
Regarding the registration error... Do you see registration failing with something like this or a corresponding error from the runtime side ?
ERROR [0000] Plugin subscribed for unhandled events UpdatePodSandbox,PostUpdatePodSandbox (0x1800)
If you do, it is caused by those two events being recently introduced to NRI without any of the runtimes implementing or even knowing about them. It's a temporary mismatch between the two until the runtimes implement them.
In the golang implementation of a plugin you'd normally simply use an EventMask 0 for subscription. Then the plugin stub checks which event handlers the plugin implements and generates a corresponding event mask and subscribes the plugin for those events.
Now, for the time being, if in your implementation you use the all/ValidEvents mask for subscription, you should mask out those above two events mentioned in the failure message. That should get you rid of the error.
Unfortunately that is really only half the story. Firstly it would be nice to mention that it is using ttrpc instead of grpc
This is mentioned in the README, but we can add a related comment to the protocol description, too.
but more importantly it should be stated that there is the multiplexer layer underneath it, with an implicit 1 meaning PluginServer, and 2 being RuntimeClient.
Yes, true. That's really missing from the documentation.
Regarding the registration error... Do you see registration failing with something like this or a corresponding error from the runtime side ?
On the runtime side, I see an UnexpectedEof, which is also suboptimal. Returning an error (for which I do not know how that could be implemented with ttrpc) would be nice, but not the biggest issue if it is coming from a programmer "error".
ERROR [0000] Plugin subscribed for unhandled events UpdatePodSandbox,PostUpdatePodSandbox (0x1800)
Kind of:
ERRO[2025-04-10T16:25:59.796301091+02:00] failed to start external plugin: invalid plugin events: 0x1800
If you do, it is caused by those two events being recently introduced to NRI without any of the runtimes implementing or even knowing about them. It's a temporary mismatch between the two until the runtimes implement them.
That is what I figured out myself then. But there should either be a way to discover the supported events, or at the very least a comment in the protocol definition à la: Only available in containerd >= 2.0.5 or cri-o 1.2.3. I would strongly prefer the discoverable way with, for example, the ConfigureRequest containing a list of supported events by the CRI.
In the golang implementation of a plugin you'd normally simply use an EventMask 0 for subscription. Then the plugin stub checks which event handlers the plugin implements and generates a corresponding event mask and subscribes the plugin for those events.
I do not plan to ever use the Golang implementation, so the protocol definition should be written in a language-agnostic way without having to consult the code to identify how it is implemented. If you feel like this is too much of a burden for the early stage, I am also fine with it. But it is something to keep in mind before going "productive", whatever that means considering it is enabled by default now on Containerd.