opentelemetry-collector icon indicating copy to clipboard operation
opentelemetry-collector copied to clipboard

Documentation improvement: tuning the collector for stability & performance

Open djluck opened this issue 1 year ago • 3 comments

Problem

I've just spent a week tuning our deployed otel collectors to improve stability and performance. I've not seen any discussion around tuning best practices and I think the project could benefit from a document that outlines some recommend parameters.

Solution

Parameters I have in mind discussing include:

  • Setting max_recv_msg_size_mib for the OTLP receiver to be the value of the batch size most clients are using (should be 512 by default according to the docs) multiplied by the the expected maximum size of the message. For example, given a max expected message size of 100kib, we'd end up setting a recommend value of 512 * 100kib = 52mib (rounding up). Without this, it's easy for the collector to drop data when receiving large batches (max message size is only 4mib by default).
  • For clients with large maximum messages, setting send_batch_max_size in the batch processor to be big enough to match what clients are sending otherwise they'll see excessive splitting of batches. This seems to be very memory intensive: I observed a 5x memory utilization drop by changing send_batch_max_size: 250 to send_batch_max_size: 512: image
  • Enabling advanced compression (e.g. zstd) for collector to collector transmission, as otel collectors all support compression algos beyond gzip.

djluck avatar Nov 25 '24 20:11 djluck