runtime-spec
runtime-spec copied to clipboard
runtime: Ambiguous namespace for state `pid` and `bundle`
The spec has:
pid(int, REQUIRED whenstatusiscreatedorrunning) is the ID of the container process, as seen by the host.bundle(string, REQUIRED) is the absolute path to the container's bundle directory.
I'm assuming those are both “in the runtime namespace”. However the “runtime namespace” wording needs to be tightened up in this case, because create and state may have been called from different namespaces. Is the pid value always in the create runtime namespace? Or when state is called from another PID namespace, is the container process ID translated into that PID namespace (if it is even visible)? The same ambiguity applies to bundle and mount namespaces.
I'd raised this issue on dev@ a while back, but with 1.0 approaching (#726), and dev@-based discussions being pretty quiet, I thought I'd cross-post here in case that helps with triage/prioritization.
I had some discussion about possibilities of how a runtime could implement namespace-agnostic ways. opencontainers/runc#1224. I'm not sure if it makes sense to define this in the spec, but it's an interesting idea IMO.
Maybe if we make state return a path to a procfs directory which users can then read stat and other pseudo-files to parse. Of course, this will massively break VM-based runtimes so it's probably not the best idea in the world.
On Thu, Mar 16, 2017 at 07:00:45AM -0700, Aleksa Sarai wrote:
Maybe if we make
statereturn a path to aprocfsdirectory…
That assumes that when the ‘state’ call succeeds, the ‘state’ caller will share a mount namespace (or enough of the mount contents) to be able to get to the right proc by following that path. This is basically the same issue we currently have with the ‘bundle’ value. For runC (which uses the filesystem to share state), this condition is likely satisfied. But other runtimes may have different approaches to sharing state (maybe they communicate with a state-registry daemon over TCP), so I don't think we can rely on this condition being satisfied in general.
Of course, this will massively break VM-based runtimes so it's probably not the best idea in the world.
I don't think there's a useful way to supply PIDs with VM-based runtimes (or Windows, #459). I'm curious to see how those runtimes get certified if we cut 1.0 without addressing that point, but I think it's orthogonal to how we distribute reliable ‘pid’ and ‘bundle’ values on Linux.