Memory scaling issue with large number of services
Motivation: Our business case needs memory to scale as number of services increases, especially in scenarios where the scale is very high.
1. What we did
Environment Details:
- Kubernetes: 1.28 (CCE)
- Istio: 1.19
- Kmesh version: release 0.5
- CPU: 8
- Memory: 16 Gib
- Tool for measuring memory: Inspektor-Gadget
We started scaling up in batches of 500 services using the below yaml file and command.
- yaml file
# svc.yaml
kind: Service
apiVersion: v1
metadata:
name: foo-service
labels:
foo: bar
spec:
clusterIP: None
selector:
app: foo
ports:
- port: 5678
- scaling up command
$ for i in $(seq 1 500); do sed "s/foo-service/foo-service-0-$(date +%s-%N)/g" svc.yaml | kubectl apply -f -; done
After every 500 services, we measured the memory consumption using inspektor gadget. The command used to measure the memory is kubectl gadget top ebpf --sort comm
2. What we observed:
The total memory usage of the kmesh bpf map remained constant, even though the number of entries in the bpf map increased (table below).
The detailed table that has the memory consumption is attached below. Please refer to the column MAPMEMORY in the table. memory_hce.txt
3. Why we think this is a problem
Our business case requires memory usage to scale as we deploy more services, instead of remaining fixed.
IN order to reduce the memory cost, we need to tune the scale up/in param, cc @nlgwcy
IN order to reduce the memory cost, we need to tune the scale up/in param, cc @nlgwcy
ok, maybe the step of scaleup/scalein is too big. I will optimize it later.
I tried to use Inspektor-Gadget, but it couldn't account for the memory of the inner_map, so there was no memory change.
K8S.NODE PROGID TYPE NAME PID COMM RUNTIME RUNCOU… MAPMEMORY MAPCOUNT
ambient-worker 409 CGroupSockA… cgroup_conn… 1061919 kmesh-daemon 8.356µs 4 308KiB 6
ambient-worker 406 SockOps sockops_prog 1061919 kmesh-daemon 6.804µs 36 8.18MiB 8
ambient-control-plane 408 SockOps sockops_prog 1061921 kmesh-daemon 3.115µs 27 8.18MiB 8
ambient-control-plane 410 CGroupSockA… cgroup_conn… 1061921 kmesh-daemon 2.035µs 3 308KiB 6
ambient-worker 395 RawTracepoi… connect_ret 1061919 kmesh-daemon 1.523µs 4 0B 0
ambient-control-plane 396 RawTracepoi… connect_ret 1061921 kmesh-daemon 90ns 3 0B 0
ambient-control-plane 399 SockOps cluster_man… 1061921 kmesh-daemon 0s 0 133.1MiB 9
ambient-control-plane 400 SockOps filter_chai… 1061921 kmesh-daemon 0s 0 8.078MiB 7
ambient-control-plane 403 SockOps filter_mana… 1061921 kmesh-daemon 0s 0 8.074MiB 6
ambient-control-plane 407 SockOps route_confi… 1061921 kmesh-daemon 0s 0 11.95MiB 8
ambient-control-plane 414 CGroupSockA… cluster_man… 1061921 kmesh-daemon 0s 0 133.1MiB 9
ambient-control-plane 415 CGroupSockA… filter_chai… 1061921 kmesh-daemon 0s 0 8.078MiB 7
ambient-control-plane 416 CGroupSockA… filter_mana… 1061921 kmesh-daemon 0s 0 8.074MiB 6
ambient-worker 401 SockOps cluster_man… 1061919 kmesh-daemon 0s 0 133.1MiB 9
ambient-worker 402 SockOps filter_chai… 1061919 kmesh-daemon 0s 0 8.078MiB 7
ambient-worker 404 SockOps filter_mana… 1061919 kmesh-daemon 0s 0 8.074MiB 6
ambient-worker 405 SockOps route_confi… 1061919 kmesh-daemon 0s 0 11.95MiB 8
ambient-worker 411 CGroupSockA… cluster_man… 1061919 kmesh-daemon 0s 0 133.1MiB 9
ambient-worker 412 CGroupSockA… filter_chai… 1061919 kmesh-daemon 0s 0 8.078MiB 7
ambient-worker 413 CGroupSockA… filter_mana… 1061919 kmesh-daemon 0s 0 8.074MiB 6
I used the bpftool command to view bpf_map and statistics, and found that there is memory change. I created 10000 services, and during the process, kmesh scale up
sudo bpftool map -j | jq ' group_by(.name) | map({name: .[0].name, total_bytes_memlock: map(.bytes_memlock | tonumber) | add, maps: length}) | sort_by(.total_bytes_memlock)'
before:
[root@localhost kmesh]# sudo bpftool map -j | jq ' group_by(.name) | map({name: .[0].name, total_bytes_memlock: map(.bytes_memlock | tonumber) | add, maps: length}) | sort_by(.total_bytes_memlock)'
[
{
"name": "kmesh_events",
"total_bytes_memlock": 0,
"maps": 2
},
{
"name": "map_of_sock_sto",
"total_bytes_memlock": 0,
"maps": 2
},
{
"name": "bpf_log_level",
"total_bytes_memlock": 8192,
"maps": 2
},
{
"name": "ig_fa_pick_ctx",
"total_bytes_memlock": 8192,
"maps": 2
},
{
"name": "inner_map",
"total_bytes_memlock": 8192,
"maps": 2
},
{
"name": "kmesh_version",
"total_bytes_memlock": 8192,
"maps": 2
},
{
"name": "tmp_buf",
"total_bytes_memlock": 8192,
"maps": 2
},
{
"name": "tmp_log_buf",
"total_bytes_memlock": 8192,
"maps": 2
},
{
"name": ".rodata",
"total_bytes_memlock": 49152,
"maps": 6
},
{
"name": "kmesh_listener",
"total_bytes_memlock": 212992,
"maps": 2
},
{
"name": "kmesh_tail_call",
"total_bytes_memlock": 278528,
"maps": 8
},
{
"name": "kmesh_manage",
"total_bytes_memlock": 393216,
"maps": 2
},
{
"name": "ig_fa_records",
"total_bytes_memlock": 1974272,
"maps": 2
},
{
"name": "containers",
"total_bytes_memlock": 2113536,
"maps": 2
},
{
"name": "exec_args",
"total_bytes_memlock": 3940352,
"maps": 2
},
{
"name": "map_of_router_c",
"total_bytes_memlock": 8126464,
"maps": 2
},
{
"name": "kmesh_cluster",
"total_bytes_memlock": 8388608,
"maps": 2
},
{
"name": "outer_map",
"total_bytes_memlock": 16777216,
"maps": 2
},
{
"name": "map_of_cluster_",
"total_bytes_memlock": 253771776,
"maps": 4
},
{
"name": null,
"total_bytes_memlock": 268419072,
"maps": 65532
}
]
after:
[root@localhost kmesh]# sudo bpftool map -j | jq ' group_by(.name) | map({name: .[0].name, total_bytes_memlock: map(.bytes_memlock | tonumber) | add, maps: length}) | sort_by(.total_bytes_memlock)'
[
{
"name": "kmesh_events",
"total_bytes_memlock": 0,
"maps": 2
},
{
"name": "map_of_sock_sto",
"total_bytes_memlock": 0,
"maps": 2
},
{
"name": "bpf_log_level",
"total_bytes_memlock": 8192,
"maps": 2
},
{
"name": "ig_fa_pick_ctx",
"total_bytes_memlock": 8192,
"maps": 2
},
{
"name": "inner_map",
"total_bytes_memlock": 8192,
"maps": 2
},
{
"name": "kmesh_version",
"total_bytes_memlock": 8192,
"maps": 2
},
{
"name": "tmp_buf",
"total_bytes_memlock": 8192,
"maps": 2
},
{
"name": "tmp_log_buf",
"total_bytes_memlock": 8192,
"maps": 2
},
{
"name": ".rodata",
"total_bytes_memlock": 49152,
"maps": 6
},
{
"name": "kmesh_listener",
"total_bytes_memlock": 212992,
"maps": 2
},
{
"name": "kmesh_tail_call",
"total_bytes_memlock": 278528,
"maps": 8
},
{
"name": "kmesh_manage",
"total_bytes_memlock": 393216,
"maps": 2
},
{
"name": "ig_fa_records",
"total_bytes_memlock": 1974272,
"maps": 2
},
{
"name": "containers",
"total_bytes_memlock": 2113536,
"maps": 2
},
{
"name": "exec_args",
"total_bytes_memlock": 3940352,
"maps": 2
},
{
"name": "map_of_router_c",
"total_bytes_memlock": 8126464,
"maps": 2
},
{
"name": "kmesh_cluster",
"total_bytes_memlock": 8388608,
"maps": 2
},
{
"name": "outer_map",
"total_bytes_memlock": 16777216,
"maps": 2
},
{
"name": "map_of_cluster_",
"total_bytes_memlock": 253771776,
"maps": 4
},
{
"name": null,
"total_bytes_memlock": 536846336,
"maps": 131066
}
]
Below are the startup logs for kmesh. I started 1000 services, deleted them, and then brought them up again.
[root@localhost kmesh]# kubectl logs -f -n kmesh-system kmesh-57rbp
cp: cannot create regular file '/lib/modules/6.1.19-7.0.0.17.oe2303.x86_64/kmesh.ko': Read-only file system
depmod: ERROR: openat(/lib/modules/6.1.19-7.0.0.17.oe2303.x86_64, modules.dep.17.329771.1728891685, 301, 644): Read-only file system
depmod: ERROR: openat(/lib/modules/6.1.19-7.0.0.17.oe2303.x86_64, modules.dep.bin.17.329771.1728891685, 301, 644): Read-only file system
depmod: ERROR: openat(/lib/modules/6.1.19-7.0.0.17.oe2303.x86_64, modules.alias.17.329771.1728891685, 301, 644): Read-only file system
depmod: ERROR: openat(/lib/modules/6.1.19-7.0.0.17.oe2303.x86_64, modules.alias.bin.17.329771.1728891685, 301, 644): Read-only file system
depmod: ERROR: openat(/lib/modules/6.1.19-7.0.0.17.oe2303.x86_64, modules.softdep.17.329771.1728891685, 301, 644): Read-only file system
depmod: ERROR: openat(/lib/modules/6.1.19-7.0.0.17.oe2303.x86_64, modules.symbols.17.329771.1728891685, 301, 644): Read-only file system
depmod: ERROR: openat(/lib/modules/6.1.19-7.0.0.17.oe2303.x86_64, modules.symbols.bin.17.329771.1728891685, 301, 644): Read-only file system
depmod: ERROR: openat(/lib/modules/6.1.19-7.0.0.17.oe2303.x86_64, modules.builtin.bin.17.329771.1728891685, 301, 644): Read-only file system
depmod: ERROR: openat(/lib/modules/6.1.19-7.0.0.17.oe2303.x86_64, modules.builtin.alias.bin.17.329771.1728891685, 301, 644): Read-only file system
depmod: ERROR: openat(/lib/modules/6.1.19-7.0.0.17.oe2303.x86_64, modules.devname.17.329771.1728891685, 301, 644): Read-only file system
time="2024-10-14T07:41:25Z" level=info msg="FLAG: --bpf-fs-path=\"/sys/fs/bpf\"" subsys=manager
time="2024-10-14T07:41:25Z" level=info msg="FLAG: --cgroup2-path=\"/mnt/kmesh_cgroup2\"" subsys=manager
time="2024-10-14T07:41:25Z" level=info msg="FLAG: --cni-etc-path=\"/etc/cni/net.d\"" subsys=manager
time="2024-10-14T07:41:25Z" level=info msg="FLAG: --conflist-name=\"\"" subsys=manager
time="2024-10-14T07:41:25Z" level=info msg="FLAG: --enable-accesslog=\"false\"" subsys=manager
time="2024-10-14T07:41:25Z" level=info msg="FLAG: --enable-bpf-log=\"true\"" subsys=manager
time="2024-10-14T07:41:25Z" level=info msg="FLAG: --enable-bypass=\"false\"" subsys=manager
time="2024-10-14T07:41:25Z" level=info msg="FLAG: --enable-mda=\"false\"" subsys=manager
time="2024-10-14T07:41:25Z" level=info msg="FLAG: --enable-secret-manager=\"false\"" subsys=manager
time="2024-10-14T07:41:25Z" level=info msg="FLAG: --help=\"false\"" subsys=manager
time="2024-10-14T07:41:25Z" level=info msg="FLAG: --mode=\"ads\"" subsys=manager
time="2024-10-14T07:41:25Z" level=info msg="FLAG: --plugin-cni-chained=\"true\"" subsys=manager
time="2024-10-14T07:41:25Z" level=info msg="kmesh start with Normal" subsys=bpf
Remaining resources are insufficient(0/0), and capacity expansion is required.
collect_outter_map_scaleup_slots:32767-32768-32767
time="2024-10-14T07:41:34Z" level=info msg="bpf loader start successfully" subsys=manager
time="2024-10-14T07:41:34Z" level=info msg="start kmesh manage controller successfully" subsys=controller
time="2024-10-14T07:41:34Z" level=info msg="service node sidecar~10.244.1.4~kmesh-57rbp.kmesh-system~kmesh-system.svc.cluster.local connect to discovery address istiod.istio-system.svc:15012" subsys=controller/config
time="2024-10-14T07:41:34Z" level=info msg="controller start successfully" subsys=manager
time="2024-10-14T07:41:34Z" level=info msg="start write CNI config" subsys="cni installer"
time="2024-10-14T07:41:34Z" level=info msg="kmesh cni use chained\n" subsys="cni installer"
time="2024-10-14T07:41:35Z" level=info msg="Copied /usr/bin/kmesh-cni to /opt/cni/bin." subsys="cni installer"
time="2024-10-14T07:41:35Z" level=info msg="wrote kubeconfig file /etc/cni/net.d/kmesh-cni-kubeconfig" subsys="cni installer"
time="2024-10-14T07:41:35Z" level=info msg="cni config file: /etc/cni/net.d/10-kindnet.conflist" subsys="cni installer"
time="2024-10-14T07:41:35Z" level=info msg="start cni successfully" subsys=manager
time="2024-10-14T07:41:35Z" level=info msg="start watching file /var/run/secrets/kubernetes.io/serviceaccount/token" subsys="cni installer"
Remaining resources are insufficient(22956/32768), and capacity expansion is required.
collect_outter_map_scaleup_slots:65534-32768-65534
The remaining resources are sufficient(19659/65536) and scale-in is required.
collect_outter_map_scalein_slots:57343-8192-57343
The remaining resources are sufficient(17180/57344) and scale-in is required.
collect_outter_map_scalein_slots:49151-8192-49151
The remaining resources are sufficient(14737/49152) and scale-in is required.
collect_outter_map_scalein_slots:40959-8192-40959
The remaining resources are sufficient(12282/40960) and scale-in is required.
collect_outter_map_scalein_slots:24739-8192-24739
time="2024-10-14T08:13:33Z" level=info msg="grpc reconnect succeed" subsys=controller
time="2024-10-14T08:30:40Z" level=info msg="wrote kubeconfig file /etc/cni/net.d/kmesh-cni-kubeconfig" subsys="cni installer"
time="2024-10-14T08:43:42Z" level=info msg="grpc reconnect succeed" subsys=controller
time="2024-10-14T09:12:46Z" level=info msg="grpc reconnect succeed" subsys=controller
time="2024-10-14T09:19:31Z" level=info msg="wrote kubeconfig file /etc/cni/net.d/kmesh-cni-kubeconfig" subsys="cni installer"
time="2024-10-14T09:45:27Z" level=info msg="grpc reconnect succeed" subsys=controller
time="2024-10-14T10:08:24Z" level=info msg="wrote kubeconfig file /etc/cni/net.d/kmesh-cni-kubeconfig" subsys="cni installer"
time="2024-10-14T10:14:09Z" level=info msg="grpc reconnect succeed" subsys=controller
time="2024-10-14T10:42:11Z" level=info msg="grpc reconnect succeed" subsys=controller
time="2024-10-14T10:57:16Z" level=info msg="wrote kubeconfig file /etc/cni/net.d/kmesh-cni-kubeconfig" subsys="cni installer"
time="2024-10-14T11:10:35Z" level=info msg="grpc reconnect succeed" subsys=controller
Remaining resources are insufficient(22954/32768), and capacity expansion is required.
collect_outter_map_scaleup_slots:65533-32768-65533