kelvin OTEL export over gRPC exhausts the max size of gRPC payloads
Describe the bug
The problem sounds very similar to
- https://github.com/open-telemetry/opentelemetry-collector/issues/1494
It manifests as the following in kelvin
exec.cc:59] Query 74f08441-b8fc-456a-af18-2aa60587ddf7 failed, reason: Internal : OTel export (carnot node_id=483)
failed with error 'RESOURCE_EXHAUSTED'. Details: grpc: received message after decompression larger than max
(5148431 vs. 4194304)
From the linked issue, this comment stood out
I cannot find such a thing "infinity" in gRPC (0 means default 4MB)
It seems like the batches pixie is sending must be too large and it might need to split them into smaller batches that do not exceed that 4MB threshold.
To Reproduce
This is an intermittent problem and seems to occur more readibly with large clusters or clusters with logs of metric points being exported over OTEL.
Expected behavior
Pixie should correctly chunk data to send to OTEL so that no data is lost.
Screenshots
Logs
App information (please complete the following information):
- Pixie version: PEM version is 0.12.18, is that what is being asked for here?
- K8s cluster version: v1.23.13-eks-fb459a0
- Node Kernel version: 5.4.226-129.415.amzn2.x86_64
- Browser version: Chrome 111.0.5563.110 (Official Build) (x86_64)
Additional context Add any other context about the problem here.
i have a proposed fix for this that i'll submit in a bit for review