selinux icon indicating copy to clipboard operation
selinux copied to clipboard

uniqMcs use 100% cpu

Open ningmingxiao opened this issue 1 month ago • 9 comments

I find when create many selinux labels will cause uniqMcs use 100% cpu when try create more than 1024*(1024-1)/2=523776 labels.

for example
set DefaultCategoryRange = uint32(10) at github.com/opencontainers/selinux/go-selinux/selinux.go make it easier to happen.

package main

import (
	"fmt"

	"github.com/opencontainers/selinux/go-selinux"
)

func main() {
	for i := 0; i <1000; i++ {
		selinux.ContainerLabels()
		fmt.Printf("times %d \n",i)
	}
	fmt.Println("done")
}

can this bug assign a cve? the func never return and use 100 % cpu @kolyshkin @cyphar

ningmingxiao avatar Nov 25 '25 15:11 ningmingxiao

ping @AkihiroSuda @thaJeztah can you take a look?

https://github.com/opencontainers/selinux/pull/246 or https://github.com/opencontainers/selinux/pull/245 may fix.

ningmingxiao avatar Nov 26 '25 01:11 ningmingxiao

can this bug assign a cve?

If you'd like to have a security advisory and CVE created for this issue you would need to show how an attacker could trigger this issue. The description you've given here gives me the impression that it's just a regular bug.

cyphar avatar Nov 27 '25 05:11 cyphar

yes, it is a bug not cve. @cyphar can you review my pr

ningmingxiao avatar Nov 27 '25 05:11 ningmingxiao

I think I would prefer #245 but I don't quite understand what purpose uniqMcs is serving and if there is a nicer solution somewhere, I'll try to take a closer look when I have some time...

cyphar avatar Nov 27 '25 05:11 cyphar

Ah okay, so it seems like there's two issues here:

  1. There is an arbitrary limit on the number of categories (1024).
  2. In order to select a value for the categories, we try to generate random numbers but because of the selection criteria (no duplicates and the numbers must be in descending order) you end up with "only" 523776 possible labels. Once you hit this case, the selection code will infinite loop with the state locked so you end up with a livelock (even if another process would free a slot).

With that in mind, #245 doesn't seem to fix the issue very directly -- the idea is that we are hoping and waiting for releaseLabel to run so that we free up some label slots. I'm not sure how often that happens in practice and how long you might stall waiting for the slots to get freed -- it seems like a better idea to me to:

  1. Increase the category limit. Since they are stored in mount strings, it seems like a limit of 9999 would be a strict improvement with no real downside in terms of readability and mount string size (maybe 99999 would add a nice buffer). If we keep all of the other restrictions in place, this gives a label limit of 49985001 and 4999850001 respectively. I wonder if we could just remove the arbitrary limits entirely, but I'm sure there's a good reason for them.
  2. We should probably be using nicer methods for generating pseudo-random numbers than binary.Read(rand.Reader)...? n % catRange will suffer from modulo bias if we implement (1). "math/rand".Intn would be the obvious thing to use.
  3. If we need to keep the arbitrary limit, maybe an error is a better solution than blocking until a label is ready (i.e., #246)? (Sorry, this is opposite of what I said earlier, I thought this was a thundering herd issue based on your description, but it's actually just a normal ID exhaustion issue.)

But again, I'm not super familiar with this code yet. I would defer to other maintainers.

EDIT: I also question whether these values should be random in the first place -- we have a global state object that is locked when we operate on it so we should be able to assign them incrementally and thus avoid slowing down when we start using up a lot of the label space and deterministically know if they've all been assigned. (Though maybe they're random to mask the execution order? I'm not sure.)

cyphar avatar Nov 27 '25 06:11 cyphar

I wonder if we could just remove the arbitrary limits entirely, but I'm sure there's a good reason for them.

I was curious about the limit as well, and earlier this week tried to trace back where the 1024 limit originated from, which needed some tracking across multiple repositories, because this code moved around;

  • it started as a package in docker/docker (moby)
  • was moved to libcontainer in the docker/docker (moby) repository
  • which became runc
  • then got moved to this repository to become a separate module

In either case, the original 1024 limit looks to have been added in https://github.com/moby/moby/commit/4c4356692580afb3971094e322aea64abe0e2500 as part of this PR;

  • https://github.com/moby/moby/pull/4211

https://github.com/moby/moby/blob/4c4356692580afb3971094e322aea64abe0e2500/pkg/selinux/selinux.go#L356 https://github.com/moby/moby/blob/4c4356692580afb3971094e322aea64abe0e2500/pkg/selinux/selinux.go#L248-L266

At a first glance, I couldn't find discussion around that limit, so it could indeed have been just an arbitrary limit

I did find a later change in this repository that would allow setting a custom range through the exported CategoryRange variable (although using a package-var for that feels a bit clunky);

  • https://github.com/opencontainers/selinux/pull/100

Allow the category range to be changed

This keeps the range as part of the global state of the package but allows users to modify the upper bounds of the category range.

thaJeztah avatar Nov 28 '25 16:11 thaJeztah

I think it should be the random value,the hackers can't predict the SElinux labels,making the system more secure. I find max=1024 https://github.com/SELinuxProject/selinux-notebook/blob/c8de690a2a4799690c5fde03373d42cd52d76a46/src/notebook-examples/embedded-policy/reference-policy/build.conf#L73

ningmingxiao avatar Nov 29 '25 12:11 ningmingxiao

Yeah the RedHat docs I found when I looked at this the other day all imply that the 1024 limit is a real system wide limit.

In that case I think we should return an error. Hoping for another process to release the resources is a little too optimistic and can easily lead to very long stalls.

However, I'm still a little confused why we are allocating MCS categories in a per-process manner in the first place.

runc only spawns one container per execution, which means this effectively makes all of this reallocation code a no-op in practice AFIACS (in opencontainers/runc#5048 you mention that runc does hit this but don't provide any information about reproducing it -- can you provide some information about where and how you run into this limit with runc?).

Furthermore, my understanding of MCS is that you use it to segregate instances of the same label from one another (i.e., other containers from each other in this instance). The current approach of allocating randomly (with no way of detecting collisions from other runc invocations) seems a ill-advised AFAICS.

I think it should be the random value,the hackers can't predict the SElinux labels,making the system more secure.

I'm not sure if predictable labels are a problem -- I'm not an SELinux expert but IIUC the attacker would need to be able to modify labels to take advantage of this knowledge which to me seems like a fairly privileged.

That being said, it seems to me the fact that the labels are random is what is allowing containers to have different labels today (because of the issue I describe above). Hmm...

cyphar avatar Nov 30 '25 09:11 cyphar

yes, return an error is necessary. pr https://github.com/opencontainers/selinux/pull/246 works well. If we want to use Sequential ID to speed up we can create another pr. People have less chance to create too many containers (more than 1000) on a node.

ningmingxiao avatar Dec 01 '25 01:12 ningmingxiao