Kmesh Logs Errors and Crashes After Deploying 165 ServiceEntries
Motivation:
A limit of 165 ServiceEntries seems lower than expected. Our production use case requires support for a very large number of services, service entries and pods
Environment Details:
Kubernetes: 1.28 OS: Openeuler 23.03 Istio: 1.19 Kmesh version: release 0.5 CPU: 8 Memory: 16 Gib
Steps To Reproduce
- Step 1: Make sure you have the below
service-entry.yamlfile at the root of your repo. This config defines 1 endpoint per service entry.
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
name: foo-service
namespace: default
spec:
hosts:
- foo-service.somedomain # not used
addresses:
- 192.192.192.192/24 # VIPs
ports:
- number: 27018
name: foo-service
protocol: HTTP
location: MESH_INTERNAL
resolution: STATIC
endpoints: # 1 endpoint per service entry. Adjust depending on your test.
- address: 2.2.2.2
- Step 2: Run the below command.
$ for i in $(seq 1 165); do sed "s/foo-service/foo-service-0-$(date +%s-%N)/g" service-entry.yaml | kubectl apply -f -; done
What was observed
After the number of ServiceEntries hit 165, Kmesh started logging the below error (see attachment) and crashed.
Note: After trying this multiple times, sometimes the error message was different malloc(): invalid next size
cc @nlgwcy @lec-bit
There may be other model limitations. We'll check.
the same issue with https://github.com/kmesh-net/kmesh/issues/941 the maximum value of inner_map, 1300, so this issue occurred. When we create 163 virtualHosts in one routeConfigs, the array 163*sizeof(ptr) > 1300. This problem can be avoided by manually adjusting the maximum value of inner_map. kmesh.json
Maximum Endpoints and Services Supported by Kmesh
After modifying the command to deploy every ServiceEntry on a separate port so that each RouteConfig would have one Virtual Host. Below are the 2 scenarios we tested.
Scenario 1: 1 endpoint (minimum possible) per ServiceEntry
Steps
- Config file: The below config file will deploy endpoint per service entry.
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
name: foo-service
namespace: default
spec:
hosts:
- foo-service.somedomain # not used
addresses:
- 192.192.192.192/24 # VIPs
ports:
- number: 27018
name: foo-service
protocol: HTTP
location: MESH_INTERNAL
resolution: STATIC
endpoints: # 1 endpoint per service entry. Adjust depending on your test.
- address: 2.2.2.2
- Command: Run the below command to deploy 1100 services, each with one endpoint
for i in $(seq 1 1100); do sed "s/foo-service/foo-service-0-$(date +%s-%N)/g;s/27018/$i/g" service-entry-1.yaml | kubectl apply -f -; done
Results
The below errors are observed when slightly more than 1000 ServiceEntries are deployed.
time="2024-11-08T18:41:39Z" level=error msg="listener 0.0.0.0_943 NONE flush failed: ListenerUpdate deserial_update_elem failed" subsys=cache/v2
time="2024-11-08T18:41:39Z" level=error msg="listener 0.0.0.0_48 NONE flush failed: ListenerUpdate deserial_update_elem failed" subsys=cache/v2
time="2024-11-08T18:41:39Z" level=error msg="listener 0.0.0.0_117 NONE flush failed: ListenerUpdate deserial_update_elem failed" subsys=cache/v2
time="2024-11-08T18:41:39Z" level=error msg="listener 0.0.0.0_138 NONE flush failed: ListenerUpdate deserial_update_elem failed" subsys=cache/v2
time="2024-11-08T18:41:39Z" level=error msg="listener 0.0.0.0_383 NONE flush failed: ListenerUpdate deserial_update_elem failed" subsys=cache/v2
time="2024-11-08T18:41:39Z" level=error msg="listener 0.0.0.0_603 NONE flush failed: ListenerUpdate deserial_update_elem failed" subsys=cache/v2
time="2024-11-08T18:41:39Z" level=error msg="listener 0.0.0.0_739 NONE flush failed: ListenerUpdate deserial_update_elem failed" subsys=cache/v2
time="2024-11-08T18:41:39Z" level=error msg="listener 0.0.0.0_786 NONE flush failed: ListenerUpdate deserial_update_elem failed" subsys=cache/v2
time="2024-11-08T18:41:39Z" level=error msg="listener 0.0.0.0_79 NONE flush failed: ListenerUpdate deserial_update_elem failed" subsys=cache/v2
time="2024-11-08T18:41:39Z" level=error msg="listener 0.0.0.0_354 NONE flush failed: ListenerUpdate deserial_update_elem failed" subsys=cache/v2
time="2024-11-08T18:41:39Z" level=error msg="listener 0.0.0.0_591 NONE flush failed: ListenerUpdate deserial_update_elem failed" subsys=cache/v2
time="2024-11-08T18:41:39Z" level=error msg="listener 0.0.0.0_675 NONE flush failed: ListenerUpdate deserial_update_elem failed" subsys=cache/v2
time="2024-11-08T18:41:39Z" level=error msg="listener 0.0.0.0_729 NONE flush failed: ListenerUpdate deserial_update_elem failed" subsys=cache/v2
time="2024-11-08T18:41:39Z" level=error msg="listener 0.0.0.0_816 NONE flush failed: ListenerUpdate deserial_update_elem failed" subsys=cache/v2
Why is this an issue ?
Our use case needs to support higher number of endpoints, and this is far lower than the theoretical 100,000 endpoints and 5000 services.
Scenario 2: 150 endpoints (maximum possible) per ServiceEntry
Steps
-
Config file: Run the same test as before, but increase the number of addresses per endpoint to 150 (last line in the config)
-
Command: Run the below command to deploy 600 services, each with 150 endpoints
for i in $(seq 1 600); do sed "s/foo-service/foo-service-0-$(date +%s-%N)/g;s/27018/$i/g" service-entry-1.yaml | kubectl apply -f -; done
Results
The below errors are observed at approx 500 services (total 75000 endpoints)
Why is this an issue ?
Our use case needs to support highter number of endpoints, and 75000 endpoints is lower than theoretical 100,000 maximum endpoints and 5000 maximum services.