veneur icon indicating copy to clipboard operation
veneur copied to clipboard

Veneur CPU usage based on metric packets processed per second.

Open prudhvi opened this issue 5 years ago • 5 comments

Hi Team We are seeing veneur using 1 logical CPU for about 60k metrics processed per second. We are sending stats to Veneur via java datadog client over Unix Domain Socket. Just wanted to know if such usage is expected or if we configured something bad.

Attaching the signalfx charts that show the stats:

veneur.worker.metrics_processed_total chart:

Screen Shot 2019-11-11 at 2 01 18 PM

Logical CPU's used by Veneur chart:

Screen Shot 2019-11-11 at 2 01 30 PM

prudhvi avatar Nov 11 '19 22:11 prudhvi

cc @aditya-stripe @antifuchs Does this CPU usage look normal/expected to you?

prudhvi avatar Nov 11 '19 22:11 prudhvi

That's not expected. Could you share a CPU profile so we can see where that's getting clogged up?

ChimeraCoder avatar Nov 11 '19 22:11 ChimeraCoder

Cool thanks, am curious how far are we off ? Should it be doing double this or 10x this?

profile_120sec.pdf profile_300sec.pdf

We have flush interval of 30 seconds and run veneur as a sidecar to app in LXC containers. We have the following config for veneur. Also FYI we do run a fork of veneur where we do some metricname parsing and renaming before we flush to signalfx in the sink signalfx.go.

  num_readers: 20
  num_span_workers: 10
  num_workers: 96
  read_buffer_size_bytes: 2097152
  metric_max_length: 32768
  span_channel_capacity: 100
  metric_max_length: 32768
  trace_max_length_bytes: 16384
  ssf_listen_addresses:
    - unix:///data/app/tmp/veneur-ssf.sock
  statsd_listen_addresses:
    - unixgram:///data/app/tmp/veneur.sock

I am attaching the CPU profile for 120 seconds and 300 seconds.

prudhvi avatar Nov 12 '19 18:11 prudhvi

@aditya-stripe , Sorry before spending more time into investigating bottleneck for CPU usage, am curious to know how much is expected throughput for 1 logical CPU in veneur for metrics consumption via statsd.

I noticed similar metric processed throughput per 1 logical CPU when using UDP too. 60k/sec is what the https://github.com/stripe/veneur#benchmarks says too but wanted to confirm , is that a old number that needs updating?

prudhvi avatar Nov 13 '19 22:11 prudhvi

Also attaching the UDP profiles (when sending metrics via UDP instead of unix domain socket) udp_profile_300sec.pdf udp_profile_120sec.pdf

prudhvi avatar Nov 19 '19 01:11 prudhvi