carbon-c-relay icon indicating copy to clipboard operation
carbon-c-relay copied to clipboard

Any way to have multiple threads per carbon_ch destination without duplicating service?

Open percygrunwald opened this issue 3 years ago • 4 comments

From what I can see, each destination in a carbon_ch output block gets a single thread. We have a configuration like this:

cluster storage_machines
  carbon_ch
  1.2.3.4:2203
  1.2.3.5:2203
  ;
match *
  send to storage_machines
  stop
  ;

And the thread for the first output is maxed out: image This results in queues rising until the limit is reached, then metrics start to get dropped.

Is there a way that we could get multiple threads for a single destination without changing the consistent hash?

I could do something like this:

cluster storage_machines
  carbon_ch
  1.2.3.4:2203=a
  1.2.3.4:2204=b
  1.2.3.5:2203=a
  1.2.3.5:2203=b
  ;

And have the reverse proxy on each storage host "merge" the results, but given that there aren't really 2 instances on the storage machine, the consistent hash view of tools like carbonate or buckytools will not be consistent with the view of the relay.

Another option would be to spin up additional carbon-c-relay services and load balance output between them:

cluster storage_writers
  any_of
  127.0.0.1:2004
  127.0.0.1:2005
  127.0.0.1:2006
  127.0.0.1:2007
  ;
match *
  send to storage_writers
  stop
  ;

Where each entry in storage_writers is another instance of carbon-c-relay with the original configuration shown at the top. This seems like a really long walk to get 4 threads writing to 1.2.3.4:2203. I guess another way would be to to have multiple instances of carbon-c-relay and reverse proxy all incoming metrics to them. This eliminates one instance of carbon-c-relay, but we're still multiplying instances of carbon-c-relay just to get multiple threads per destination. It would be nice to be able to specify the number of workers per destination like we can specify the number of dispatchers with -w.

Thank you for any suggestions.

percygrunwald avatar Feb 04 '22 22:02 percygrunwald

Hmmm, by design I never included the option to use parallel delivery, I need to understand what is causing your load, if it is due to computing the hash or that it is related to locking from the main input queue (in which case threading won't help).

There is no way to do it right now, but since the code already has provisions to share an output queue, a global option like you mention may not be too difficult.

If I'd create an experimental patch would you be able to test if it has the desired effect?

grobian avatar Feb 05 '22 08:02 grobian

Hi @grobian, thank you for your reply.

If I'd create an experimental patch would you be able to test if it has the desired effect?

Absolutely, I would happily do that.

I realize I have also missed details about the release version and platform, I will confirm these and provide some more metrics tomorrow.

percygrunwald avatar Feb 06 '22 19:02 percygrunwald

Current env:

carbon-c-relay version: 3.4 (we should test with a newer release, I didn't realize we were on such an old release)
OS: Debian Jessie (kernel version 4.19)
CPU: Intel(R) Xeon(R) CPU E5-2699A v4 @ 2.40GHz

As a baseline looking at our current data, it looks like we are using are around 2.6-3.3 us of walltime per metric for these carbon_ch outputs.

destinations.*.wallTime_us / destinations.*.sent
image

Based on the maximum value, we should be be able to send around 300k metrics/CPU second, which is consistent with what we observed last week.

image

I will try to test with a newer release tomorrow and report back if there is any change to the performance.

percygrunwald avatar Feb 08 '22 00:02 percygrunwald

ok, I'll wait for that, if I have some cycles before that I'll see if I can prepare anything anyway.

grobian avatar Feb 08 '22 08:02 grobian