alcor icon indicating copy to clipboard operation
alcor copied to clipboard

[Perf] Alcor Control Agent Performance Profiling

Open xieus opened this issue 5 years ago • 2 comments

Request

  • Set up a performance profiling framework for ACA
  • Collect latency and throughput metrics for large payload
  • Optimize ACA multiple threading
  • Look into the narrow down locking scope to improve performance at high-concurrency situations
  • Investigate on OVS DB batch insertion to improve performance

xieus avatar Oct 21 '20 22:10 xieus

Linked to an umbrella issue #440.

xieus avatar Oct 21 '20 22:10 xieus

Per issue description, I will break down the ACA performance profiling task into two major areas.

ACA handling of large payload

  1. Framework to use: aca_tests to create large payload and send to ACA
  2. Example payload could be 1 port create plus 10, 100, ...1000, 10,000, 100,000 neighbors
  3. Collect latency and throughput metrics
  4. Identify bottleneck and problematic areas (possibly OVS)
  5. Optimize ACA multiple threading model, do we want to limit the max parallel thread to use = number of CPU * 2?
  6. Can we bundle a batch (e.g. 10) of similar neighbors to process in a single call? It may help with the locking mechanism of ACA internal structures.

ACA handling of packet in message from OVS

  1. Framework to use: cbench (https://github.com/mininet/oflops/tree/master/cbench) to ACA as an openflow controller
  2. Use the payload generated from cbench, test the latency mode then throughput mode
  3. Collect latency and throughput metrics
  4. Identify bottleneck and problematic areas
  5. When we have on demand L3 routing rules implemented, it is possible for VM to quickly create a lot of new connections to a new neighbor which will generate a lot of packet in message to ACA for process. We need to confirm ACA can handle this
  6. If ACA slow down is observed, consider spining up more threads to handle mulitple packet in message in parallel

Other Notes

  1. Can we use framework like SeaStar to improve ACA threading model? https://github.com/futurewei-cloud/chogori-seastar-rd

er1cthe0ne avatar Nov 25 '20 00:11 er1cthe0ne