bees icon indicating copy to clipboard operation
bees copied to clipboard

How to use cgroups to restrict the resource use of bees.

Open daiaji opened this issue 3 years ago • 10 comments

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/resource_management_guide/sec-modifying_control_groups https://wiki.archlinux.org/title/PCI_passthrough_via_OVMF#Dynamically_isolating_CPUs Basically, I am using a VM with GPU passthrough to run win10 to play some games, but the excessive resource utilization rate of bees in the background makes it impossible for me to play games when bees is enabled.

I tried these commands, but they didn't seem to help much.

systemctl set-property --runtime -- system.slice AllowedCPUs=0,1,2,3,4,5,12,13,14,15,16,17
systemctl set-property --runtime -- user.slice AllowedCPUs=0,1,2,3,4,5,12,13,14,15,16,17
systemctl set-property --runtime -- init.scope AllowedCPUs=0,1,2,3,4,5,12,13,14,15,16,17

I tried the following command again.

systemctl set-property --runtime -- system.slice AllowedCPUs=0,1,2,12,13,14
systemctl set-property --runtime -- user.slice AllowedCPUs=3,4,5,15,16,17
systemctl set-property --runtime -- init.scope AllowedCPUs=3,4,5,15,16,17

But there seems to be no improvement. Does this mean that I need to limit the memory and block IO utilization of bees?

And I use looking glass. This is my CPU topology.

lscpu -e                                    
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE    MAXMHZ    MINMHZ       MHZ
  0    0      0    0 0:0:0:0          yes 4672.0698 2200.0000 4240.7588
  1    0      0    1 1:1:1:0          yes 4672.0698 2200.0000 4242.0342
  2    0      0    2 2:2:2:0          yes 4672.0698 2200.0000 4244.5869
  3    0      0    3 4:4:4:1          yes 4672.0698 2200.0000 4130.1152
  4    0      0    4 5:5:5:1          yes 4672.0698 2200.0000 3313.7590
  5    0      0    5 6:6:6:1          yes 4672.0698 2200.0000 2377.3391
  6    0      0    6 8:8:8:2          yes 4672.0698 2200.0000 2160.5420
  7    0      0    7 9:9:9:2          yes 4672.0698 2200.0000 2165.0000
  8    0      0    8 10:10:10:2       yes 4672.0698 2200.0000 2753.5549
  9    0      0    9 12:12:12:3       yes 4672.0698 2200.0000 2199.7791
 10    0      0   10 13:13:13:3       yes 4672.0698 2200.0000 2199.7839
 11    0      0   11 14:14:14:3       yes 4672.0698 2200.0000 2199.7261
 12    0      0    0 0:0:0:0          yes 4672.0698 2200.0000 4243.2100
 13    0      0    1 1:1:1:0          yes 4672.0698 2200.0000 4244.3301
 14    0      0    2 2:2:2:0          yes 4672.0698 2200.0000 4245.4038
 15    0      0    3 4:4:4:1          yes 4672.0698 2200.0000 3053.0010
 16    0      0    4 5:5:5:1          yes 4672.0698 2200.0000 3773.1399
 17    0      0    5 6:6:6:1          yes 4672.0698 2200.0000 2415.8169
 18    0      0    6 8:8:8:2          yes 4672.0698 2200.0000 2169.6240
 19    0      0    7 9:9:9:2          yes 4672.0698 2200.0000 2151.9880
 20    0      0    8 10:10:10:2       yes 4672.0698 2200.0000 2615.6260
 21    0      0    9 12:12:12:3       yes 4672.0698 2200.0000 2199.6609
 22    0      0   10 13:13:13:3       yes 4672.0698 2200.0000 2199.7520
 23    0      0   11 14:14:14:3       yes 4672.0698 2200.0000 2199.7190

daiaji avatar Aug 15 '22 14:08 daiaji

btrfs is fairly CPU-heavy in the kernel, and this CPU usage can bypass normal process priority control mechanisms. Also there is much less isolation between processes for IO requests in btrfs than in other filesystems, so that limiting IO in any cgroup will slow down all processes on the system.

Your best bet is to reduce the number of worker threads in bees using the -c or -g options to reduce the load at the source. This will also reduce the memory requirement--libc will add about 128MB per CPU for malloc arenas alone.

Zygo avatar Aug 15 '22 16:08 Zygo

sudo beesd -c=6 /run/bees/mnt/df6d2459-8f5a-4585-a2ff-5461bf494be4
uuidparse: invalid option -- 'c'
Try 'uuidparse --help' for more information.
ERROR: No config for -c=6

That's strange.

daiaji avatar Aug 17 '22 11:08 daiaji

btrfs is fairly CPU-heavy in the kernel, and this CPU usage can bypass normal process priority control mechanisms.

I think there's also a lot of unrelated lock contention which affects the VM performance. Maybe it helps to put the VM onto a separate file system. OTOH, using chattr +m on the VM image directory (before creating the image), moving it to a dedicated subvolume and throwing lots of RAM at the system (32 GB+) helped my use cases.

Also there is much less isolation between processes for IO requests in btrfs than in other filesystems, so that limiting IO in any cgroup will slow down all processes on the system.

I think I already noticed that and removed cgroup priority and bandwidth settings. Thanks for confirming.

kakra avatar Aug 17 '22 14:08 kakra

I have two NVME SSDs, one of which has been passed through to the VM.

daiaji avatar Aug 17 '22 14:08 daiaji

I have two NVME SSDs, one of which has been passed through to the VM.

So it affects even completely isolated IO?

sudo beesd -c=6 /run/bees/mnt/df6d2459-8f5a-4585-a2ff-5461bf494be4
uuidparse: invalid option -- 'c'
Try 'uuidparse --help' for more information.
ERROR: No config for -c=6

That's expected. If you're using bees directly with a mount point, do not use the beesd but use /usr/libexec/bees directly (or /usr/lib/bees or similar depending on distribution). Also, it's probably -c6 or --thread-count=6. I'm not sure if the arg parser groks -c=6 well.

kakra avatar Aug 17 '22 15:08 kakra

Basically I use systemd service, is there any way to load tune a running systemd service?

daiaji avatar Aug 17 '22 15:08 daiaji

Basically, I'm using this:

# /etc/systemd/system/bees.service
[Unit]
Description=Bees
Documentation=https://github.com/Zygo/bees
After=local-fs.target
RequiresMountsFor=/mnt/btrfs-pool

[Service]
Type=simple
Environment=BEESSTATUS=%t/bees/bees.status
ExecStart=/usr/libexec/bees --no-timestamps --strip-paths --thread-count=3 --loadavg-target=5 --verbose=5 /mnt/btrfs-pool
CPUAccounting=true
CPUWeight=12
IOSchedulingClass=idle
IOSchedulingPriority=7
KillMode=control-group
KillSignal=SIGTERM
MemoryAccounting=true
Nice=19
Restart=on-abnormal
ReadWritePaths=/mnt/btrfs-pool
RuntimeDirectory=bees
StartupCPUWeight=25
WorkingDirectory=/run/bees

# Hide other users' process in /proc/
ProtectProc=invisible

# Mount / as read-only
ProtectSystem=strict

# Forbidden access to /home, /root and /run/user
ProtectHome=true

# Mount tmpfs on /tmp/ and /var/tmp/.
# Cannot mount at /run/ or /var/run/ for they are used by systemd.
PrivateTmp=true

# Disable network access
PrivateNetwork=true

# Use private IPC namespace, utc namespace
PrivateIPC=true
ProtectHostname=true

# Disable write access to kernel variables throug /proc
ProtectKernelTunables=true

# Disable access to control groups
ProtectControlGroups=true

# Set capabilities of the new program
# The first three are required for accessing any file on the mounted filesystem.
# The last one is required for mounting the filesystem.
AmbientCapabilities=CAP_DAC_OVERRIDE CAP_DAC_READ_SEARCH CAP_FOWNER CAP_SYS_ADMIN

# With NoNewPrivileges, running sudo cannot gain any new privilege
NoNewPrivileges=true

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/bees.service.d/override.conf
[Service]
Slice=maintenance.slice
IOWeight=10
StartupIOWeight=25

# /etc/systemd/system/maintenance.slice
[Unit]
Description=Limit maintenance tasks resource usage

[Slice]
AllowedCPUs=16-19
CPUWeight=20
IOWeight=10

Adjust for your needs. I'm running on i7-12700K, CPU 16-19 are the e-cores. My subvolid=0 is mounted at /mnt/btrfs-pool.

If the kernel leaks IO priorities into other process contexts, you may want to remove these:

IOSchedulingClass=idle
IOSchedulingPriority=7

Tuning the services while running may partially work using systemctl edit [--full] {bees.service|maintenance.slice}, or some of the property commands of systemctl.

kakra avatar Aug 17 '22 15:08 kakra

It looks like I should use libvirt hooks to stop the bees service after the VM is running.

daiaji avatar Aug 17 '22 16:08 daiaji

If the kernel leaks IO priorities into other process contexts,

How does one find out for sure whether their kernel does or does not do that? For example mine (NixOS 6.2.11, i.e. the latest and presumably mostly-vanilla) apparently does, judging by system behavior with and without the IOSchedulingClass=idle setting in the bees service (local audio playback stutters with it, works fine without it).

And if the norm is that the kernel does leak IO priorities, then maybe those lines should be removed from [email protected], or at least commented out with the explanation?

cmm avatar Apr 20 '23 08:04 cmm

There seem to be a lot of issues coming into play here:

  • IOSchedulingClass leaks into other processes IO, maybe due to tree updates affecting multiple processes, I don't know but it seems to be a thing (thanks @cmm, interesting observation)
  • Using memory cgroups probably allows bees stealing ownership of cached content from other processes, if you limit bees memory via cgroups, you are actually discarding valuable caches early from other processes, this is not limited to bees but bees tends to touch fresh data generated by other processes, so I currently disabled memory cgroups in my kernel and rather use the generational LRU
  • the more snapshots, the higher the kernel sys times caused by bees, negatively affecting other processes probably due to tree locks: I think this can be somewhat mitigated by adding more spindles (or other storage adding new IO queues) to the pool, and isolating workloads to different subvolumes
  • I've discovered that some btrfs operations cause vast amounts of kernel memory allocations, showing up in /proc/buddyinfo as huge amounts of order 1 and 2 free memory blocks, essentially causing memory fragmentation, which in turn discards caches early due to memory pressure, it leads to IO stutter, IO stalls and affects even unrelated processes and filesystems, this behavior seems to occur in short bursts, using transparent hugepages magnifies the issue exponentially, increasing vm.min_free_kb can somewhat mitigate the problem

kakra avatar Apr 20 '23 10:04 kakra