runc icon indicating copy to clipboard operation
runc copied to clipboard

fix set config back error for runc update

Open lifubang opened this issue 2 years ago • 11 comments

Now, if we use runc update to update the resources limit of a container, runc will set configs back when there is an error. But, we can't set the config back for ever because the config and c.config have the same address. It will cause state.json and actual configs in an inconsistent state. For example:

root@acmcoder:/acmcoder/images# /home/lifubang/go/src/github.com/opencontainers/runc/runc update --cpuset-cpus 1000 --memory 1000000000 test
WARN[0000] Setting back cgroup configs failed due to error: failed to write "1000": write /sys/fs/cgroup/cpuset/test/cpuset.cpus: numerical result out of range, your state.json and actual configs might be inconsistent. 
ERRO[0000] failed to write "1000": write /sys/fs/cgroup/cpuset/test/cpuset.cpus: numerical result out of range

Signed-off-by: lifubang [email protected]

lifubang avatar Jan 29 '22 12:01 lifubang

There is a lot about config and runc update which needs to be reworked.

Can you add a test case please?

Using marshal/unmarshal seems weird. I know this is the easiest way to do a "deep copy", but maybe instead we should create something like func (c *Config) Copy and use it.

kolyshkin avatar Jan 31 '22 19:01 kolyshkin

There is a lot about config and runc update which needs to be reworked.

Yes, I think we should open an new issue thread to track this. With a simple think in my head, I can give a small list:

  1. Instead of set all resource limits at one time, we should set resource limits one controller by on controller. With this way, we can commit to state.json for each controller once the setting action succeed;
  2. Just only set resource limits for controllers(s) provided in the command by user, not all the controllers in cgroup;
  3. We should add a Mutual exclusion mechanism in the command runc update for each container;
  4. Support update device configuration;
  5. Get resource limits value from cgroup sub system file contents when loading container config.

Can you add a test case please?

Of course.

but maybe instead we should create something like func (c *Config) Copy and use it.

I take a quick look at containerd code, they use the same method marshal/unmarshal to do deep copy some complex structs. I think if we define a new Copy function, we need to write lots of code to copy each fields, once some fields changed in the future, we need to change the code in this Copy function. Do you really think it worth to do this? And I have another thought, if we write an new function to update intelRdt, we can change the param type from configs.Config to configs.Resources in container.Set. But I don't know whether we can change this interface or not.

Another question, do you know whether this command(runc update) is used by some upstream projects, such as containerd, or not? Or they have their own way to do resource limits update?

lifubang avatar Feb 01 '22 14:02 lifubang

I'm ok with using marshal/unmarshal for deep copy. The only thing is, maybe something like encoding/gob (which is kind of a protobuf simplified) is more efficient (and equally capable). OTOH its use may make the binary larger, so that has to be checked as well.

Another question, do you know whether this command(runc update) is used by some upstream projects, such as containerd, or not? Or they have their own way to do resource limits update?

I remember that cri-o is using runc update; @haircommander can shed some more light

kolyshkin avatar Feb 01 '22 23:02 kolyshkin

Hmm, I played with this and apparently using json for deep copy does not copy SystemdProps, since they have dbus.Variant which can't be marshalled as all its fields are private.

For the same reason, gob can't encode Config, but at least it provides some insight about why:

Error encoding: gob: type dbus.Variant has no exported fields

This means we have to implement marshal/unmarshal for either json or gob (or both) in dbus.

kolyshkin avatar Feb 03 '22 04:02 kolyshkin

apparently using json for deep copy does not copy SystemdProps

I read the code and found that the field SystemdProps is not provided by the run time spec config file, but be calculated from spec.Annotations. And the field spec.Annotations is serialized to state.json with the name labels, so may be we should re-calculate SystemdProps when we load state.json? Compare to your solution This means we have to implement marshal/unmarshal for either json or gob (or both) in dbus., which one is your prefer?

lifubang avatar Feb 04 '22 01:02 lifubang

OTOH its use may make the binary larger, so that has to be checked as well.

The size comparation of run binary:

make type json gob +percentage
dynamic 14644480 14939800 2.02%
static 13561768 13803688 1.78%

The increased size is about 288kb. And with the more efficient algorithm, the more robust error check method, how about replace json with gob? May be open an new pr to deal with this and SystemdProps?

lifubang avatar Feb 04 '22 02:02 lifubang

I think we can keep in using json.Marsharl, because there are many works to do to replace it with Gob, and dbus maintainer doesn’t want to implement json/gob interfaces. Most important of all, we have ignored the SystemdProps field when marshal a config to json.

lifubang avatar Feb 06 '22 16:02 lifubang

How about to do some refactoring jobs to remove SystemdProps field in config, and generate it from config.Labels before we call StartUnit? Because I think SystemdProps doesn’t really belong to the config, but some temporary values from other fields for communicating with systemd. @kolyshkin

lifubang avatar Feb 06 '22 16:02 lifubang

How about to do some refactoring jobs to remove SystemdProps field in config, and generate it from config.Labels before we call StartUnit? Because I think SystemdProps doesn’t really belong to the config, but some temporary values from other fields for communicating with systemd. @kolyshkin

So, this was done in specconv because this was we can really validate the spec and reject some invalid configuration (bad systemd properties) early in the process (rather than in manager.Apply, which happens than the container is already started).

I guess this is not that critical, and we can use some intermediate representation for systemd properties -- feel free to come up with something

kolyshkin avatar Feb 08 '22 03:02 kolyshkin

So (without having given it a huge amount of thought) I'm a bit on the fence if it's a good thing to implement this as a generic DeepCopy() utility; I guess "it works", but making it a generic utility also means that we can't make optimizations (if needed). If we want to keep it as a separate utility, perhaps it should live in config (or, as @kolyshkin mentioned) be added to the config itself.

Looking a slightly bit further, at how container.Config() is used:

It's used in runnerrun() and createExecFifo() https://github.com/opencontainers/runc/blob/43186447b99c81d7e12876dc8277e2f6fd538850/utils_linux.go#L267-L274 https://github.com/opencontainers/runc/blob/e4e2a9dda4647373a26a19e7d475eea573f09dbf/libcontainer/container_linux.go#L409-L416

Both of those seem to be better suited if the container had a container.HostRootUIDGID() (e.g.) function that returned both uid and gid; that would also fix a theoretical race where state could be mutated in between both calls.

Further uses (besides the update being worked in in this PR) are:

In TestFactoryLoadContainer() (only used to "inspect" the container config) https://github.com/opencontainers/runc/blob/eddf35e5462e2a9f24d8279874a84cfc8b8453c2/libcontainer/factory_linux_test.go#L171

In TestGetContainerStateAfterUpdate() - actually wondering (haven't verified) if that test is doing the right thing, as it's mutating the config (without making a copy), then "writing it back" with container.Set() https://github.com/opencontainers/runc/blob/eddf35e5462e2a9f24d8279874a84cfc8b8453c2/libcontainer/container_linux_test.go#L380-L382

From the above, I'm wondering if container.Config() should perhaps always return a copy?

Orthogonal (not something introduced in this PR);

  • I think runc update will always have a race condition, as to concurrent runc update calls would not take eachother into account. I guess there's no "easy" solution to that (other than file locking or introducing some "sequence" property).
  • (Definitely not related to this PR); Not a huge fan of the "rollback" in container.Set() (for the c.cgroupManager.Set()); I feel like that should've been the responsibility of c.cgroupManager.Set() itself.

thaJeztah avatar Feb 09 '22 15:02 thaJeztah

In TestGetContainerStateAfterUpdate() - actually wondering (haven't verified) if that test is doing the right thing, as it's mutating the config (without making a copy), then "writing it back" with container.Set()

Yes, that test doesn't do the right thing, the value of Memory will be always 2048! :worried:

lifubang avatar Feb 10 '22 16:02 lifubang