alcor
alcor copied to clipboard
[Perf] Alcor Control Agent Performance Profiling
Request
- Set up a performance profiling framework for ACA
- Collect latency and throughput metrics for large payload
- Optimize ACA multiple threading
- Look into the narrow down locking scope to improve performance at high-concurrency situations
- Investigate on OVS DB batch insertion to improve performance
Linked to an umbrella issue #440.
Per issue description, I will break down the ACA performance profiling task into two major areas.
ACA handling of large payload
- Framework to use: aca_tests to create large payload and send to ACA
- Example payload could be 1 port create plus 10, 100, ...1000, 10,000, 100,000 neighbors
- Collect latency and throughput metrics
- Identify bottleneck and problematic areas (possibly OVS)
- Optimize ACA multiple threading model, do we want to limit the max parallel thread to use = number of CPU * 2?
- Can we bundle a batch (e.g. 10) of similar neighbors to process in a single call? It may help with the locking mechanism of ACA internal structures.
ACA handling of packet in message from OVS
- Framework to use: cbench (https://github.com/mininet/oflops/tree/master/cbench) to ACA as an openflow controller
- Use the payload generated from cbench, test the latency mode then throughput mode
- Collect latency and throughput metrics
- Identify bottleneck and problematic areas
- When we have on demand L3 routing rules implemented, it is possible for VM to quickly create a lot of new connections to a new neighbor which will generate a lot of packet in message to ACA for process. We need to confirm ACA can handle this
- If ACA slow down is observed, consider spining up more threads to handle mulitple packet in message in parallel
Other Notes
- Can we use framework like SeaStar to improve ACA threading model? https://github.com/futurewei-cloud/chogori-seastar-rd