aperture icon indicating copy to clipboard operation
aperture copied to clipboard

Agent performance improvements

Open krdln opened this issue 1 year ago • 0 comments

Tackle low-hanging fruits in terms of agent performance improvements and remove major bottlenecks. Also try to reduce GC pressure.

image

The largest CPU utilization (10%) is RequestToInputWithServices which is creating the input for Rego. We can avoid this for flows that don't have any matching classifiers by running selector matching stage before creating this input. Classify function that invokes Rego is 6.5% CheckRequest is 3.6% Log processing is 10% Batch processor (for sending cloud metrics) is 5% Grpc stream handling is 13% go runtime schedule is 10 gc is about 10%

  • [x] https://github.com/fluxninja/aperture/issues/2171
  • [ ] reduce mallocs
    • [x] actually profile mallocs @krdln
      • [ ] write it down somewhere
    • [x] #2212
    • [ ] try pooling in other places
  • [ ] skip rego for simple classifiers (extractors)
  • [ ] try to shrink check response (check response is marshaled in Check and unmarshaled in logs processing). (or even cache check responses as they're agent-to-agent?)
  • [ ] other bottlenecks
    • [x] deep copies of entities: https://github.com/fluxninja/aperture/pull/2178
    • [x] speed up sort in rollupprocessor.key: https://github.com/fluxninja/aperture/pull/2186
    • [ ] try to reduce Gets from attribute map
    • [ ] StartTracesOp is > 2%, what's that? This is caused by obsreport tracing, which we don't use
    • [x] AddCheckResponseBasedLabels: https://github.com/fluxninja/aperture/pull/2186
    • [ ] label convertions
  • [ ] investigate why we spend in 12% in runtime.mcall – is it normal or do we do sth weird with goroutines?

In italics are marked things that require some bigger changes

krdln avatar Jun 20 '23 10:06 krdln