pixie icon indicating copy to clipboard operation
pixie copied to clipboard

kelvin OTEL export over gRPC exhausts the max size of gRPC payloads

Open caphrim007 opened this issue 2 years ago • 7 comments

Describe the bug

The problem sounds very similar to

  • https://github.com/open-telemetry/opentelemetry-collector/issues/1494

It manifests as the following in kelvin

exec.cc:59] Query 74f08441-b8fc-456a-af18-2aa60587ddf7 failed, reason: Internal : OTel export (carnot node_id=483)
 failed with error 'RESOURCE_EXHAUSTED'. Details: grpc: received message after decompression larger than max 
(5148431 vs. 4194304)

From the linked issue, this comment stood out

I cannot find such a thing "infinity" in gRPC (0 means default 4MB)

It seems like the batches pixie is sending must be too large and it might need to split them into smaller batches that do not exceed that 4MB threshold.

To Reproduce

This is an intermittent problem and seems to occur more readibly with large clusters or clusters with logs of metric points being exported over OTEL.

Expected behavior

Pixie should correctly chunk data to send to OTEL so that no data is lost.

Screenshots

Logs

pixie_logs_20230328140709.zip

App information (please complete the following information):

  • Pixie version: PEM version is 0.12.18, is that what is being asked for here?
  • K8s cluster version: v1.23.13-eks-fb459a0
  • Node Kernel version: 5.4.226-129.415.amzn2.x86_64
  • Browser version: Chrome 111.0.5563.110 (Official Build) (x86_64)

Additional context Add any other context about the problem here.

caphrim007 avatar Mar 28 '23 21:03 caphrim007

i have a proposed fix for this that i'll submit in a bit for review

caphrim007 avatar Jul 14 '23 18:07 caphrim007