datafusion-ballista icon indicating copy to clipboard operation
datafusion-ballista copied to clipboard

Consider using gRPC streams + chunking to avoid message size limits

Open andygrove opened this issue 2 years ago • 2 comments

Is your feature request related to a problem or challenge? Please describe what you are trying to do. Ballista gRPC messages can be very large when physical plans are referencing many partitions and we always have the risk of hitting the maximum message size.

Describe the solution you'd like gRPC streams can be used to transmit data in chunks.

Describe alternatives you've considered

Additional context

andygrove avatar Dec 12 '23 14:12 andygrove

I wonder if this is still needed after we propagated grpc max message size and IpcWriter set to 2MB chunk size

milenkovicm avatar Feb 27 '25 19:02 milenkovicm

One strategy to reduce message size we used is pruning the partitions before encoding them as task. This makes the message size pretty constant however how many partitions there might be in the query (and also saves on some decoding / encoding and bandwidth).

Dandandan avatar Feb 27 '25 20:02 Dandandan

I believe this has been implemented in #1318 we can re open if more work is needed

milenkovicm avatar Sep 14 '25 09:09 milenkovicm