Krisztian Litkey
Krisztian Litkey
@yylt Is that a reproducible problem in your environment ? Can you give a bit more details ? I'd be interested at least in the number of pods and containers...
And how many containers do you have altogether in those pods ?
Not per pod. The number of total containers in all pods. I assume we hit the ttrpc messageLengthMax limit with the sync request, so what matters is both the total...
Also it would be interesting to see the result of these: - crictl pods -o json | wc -c - crictl ps -o json | wc -c
@yylt I have a branch with a fix for kicking out plugins if synchronization fail, which alone would provide more graceful behavior, by kicking the plugin out if synchronization fails....
@yylt Yes, but I have a directly patched 1.17.16 containerd tree pointing at that nri version and re-vendored here, so it's easier to just compile and use that... https://github.com/klihub/containerd/tree/fixes/yylt-sync-failure
Oh, and you will need to recompile your plugin against that NRI tree as well. Otherwise the runtime-side will detect that the plugin does not have the necessary support compiled...
> After replacing both `nri-daemon` and `containerd`, the `sync error` no longer occurs upon restart. > > If `nri-daemon` is replaced individually, the issue still persists. Yes, that is the...
A fix for the oversized initial synchronization message has been merged (229236e40a59a1aa91759b6e2449663f0aac7ad9).
Fixed by #111.