[Bug] - Changing Scheduler to SCHED_RR not working
Describe the bug
Changing Scheduler to SCHED_RR or SCHED_FIFO is not working.
chrt sched_setscheduler or CPUSchedulingPolicy=rr in systemd service all fails because of Operation not permitted.
I've tried it with sudo or root or sudo setcap cap_sys_nice+ep "$(readlink -f $(which something))" but none of them worked.
To Reproduce
$ sudo chrt --rr 10 ls
This fails with error chrt: failed to set pid 0's policy: Operation not permitted
Expected behavior I've expected run with root will success, but it also fails. Is it bug?
Additional context I'm novice linux user, so please notice me if I've missed something.
I've found why. It's because of CONFIG_RT_GROUP_SCHED is enabled, unlike other common general-purpose kernels.
This is correct, we are looking into it. There are ways to specify time allocation to the cgroups created by systemd, but it's cumbersome and it appears that other distributions disable CONFIG_RT_GROUP_SCHED instead. We're investigating what the best option is for AL2023. We'll either turn this off in a future release or document how to "work around" with systemd
@ozbenh Thanks for reply!
Readme in systemd repository says that
We recommend to turn off Real-Time group scheduling in the kernel when using systemd. RT group scheduling effectively makes RT scheduling unavailable for most userspace, since it requires explicit assignment of RT budgets to each unit whose processes making use of RT. As there's no sensible way to assign these budgets automatically this cannot really be fixed, and it's best to disable group scheduling hence. CONFIG_RT_GROUP_SCHED=n
and very naive quick fix for this is disable RT throttling
sudo sysctl -w kernel.sched_rt_runtime_us=-1
Is this "safe" if I can control "not to starve"? or should I avoid this?
I'm willing to have some "safe" workarounds, or next AL2023 build without CONFIG_RT_GROUP_SCHED :)
@ozbenh Hey, do you have any news on it? We faced the same issue.
It looks like our kernel team might have dropped the ball on that one. I'll poke internally.
I just verified, CONFIG_RT_GROUP_SCHED is not set in our current 6.1 kernels, it looks like we disabled it a while back (in mid 2023). Are you still experiencing issues with current AMIs ? If yes you might need to tell us more details about your specific problem.
Are you still experiencing issues with current AMIs ?
Seems so, without a path proposed in a ticket chrt isn't working on the machine. What kind of info would you need? I can perform some tests if needed.
@goznauk I will follow up with this reported issue about sched_rt.
I tried to reproduce the issue your reported. I don't see the same error as you experienced.
- on AL2023 v6.12
grep CONFIG_RT_GROUP_SCHED /boot/config-6.12.37-61.105.amzn2023.x86_64
# CONFIG_RT_GROUP_SCHED is not set
uname -r
6.12.37-61.105.amzn2023.x86_64
sudo strace chrt --rr 10 ls 2>&1 | grep sched
sched_get_priority_min(SCHED_RR) = 1
sched_get_priority_max(SCHED_RR) = 99
sched_setscheduler(0, SCHED_RR, [10]) = 0
- on AL2023 v6.1
uname -r
6.1.131-143.221.amzn2023.x86_64
grep CONFIG_RT_GROUP_SCHED /boot/config-6.1.131-143.221.amzn2023.x86_64
# CONFIG_RT_GROUP_SCHED is not set
sudo strace chrt --rr 10 ls 2>&1 | grep sched
sched_get_priority_min(SCHED_RR) = 1
sched_get_priority_max(SCHED_RR) = 99
sched_setscheduler(0, SCHED_RR, [10]) = 0
Could you add more information about the Amazon Linux instance? E.g. the kernel version, 6.1 or 6.12 Did you customize the Amazon kernel?
The fundamental question, why do you need the control group of real time scheduler (CONFIG_RT_GROUP_SCHED)? Why the default CFS scheduler (completely fair scheduler) control group (CONFIG_FAIR_GROUP_SCHED) can't meet your use case? Could you give us more details about your special use case on CPU scheduler?
In addition, when you ran the chrt, is the user in any cgroup?
cat /proc/self/cgroup
0::/user.slice/user-1000.slice/session-1.scope
OK, read the original reported issue again. What I read is you are not complaining a process can't be set to be scheduled by real-time scheduler, like your simple reproducer, you are saying you are not able to set a task in a systemd cgroups to the real-time scheduler. FYI, real-time scheduler is always available not behind any configuration.
So, can you provide more about your reproduce steps when using systemd cgroups ?