kafkajs icon indicating copy to clipboard operation
kafkajs copied to clipboard

Kafka batch of messages too large

Open jagij opened this issue 3 years ago • 4 comments

Is your feature request related to a problem? Please describe. We get "KafkaJSProtocolError: The request included a message larger than the max message size the server will accept" when a batch of messages is too large to send in one go. The individual messages are small enough to be sent to Kafka.

Describe the solution you'd like The ideal interface would be to send messages in smaller batches no matter the size size of the batch. This allows users to choose the optimal produce size instead of having to choose a suboptimal solution (like a fixed batch size based on experiments, or worse).

An implementation could request those max sizes from Kafka before trying to send, and then chop the batches down in smaller chunks if necessary. In our workaround solution, we catch the error, calculate an approximation of the size (using JSON.stringify), and then retry automatically with smaller chunks. The size that works, is then stored so that future batches don't need to go through the retry mechanism. This workaround is implemented on top of the sendBatch call, so it is completely optional.

Note that this naïve approach makes it impossible for a client to figure out which messages have been sent and which not, as opposed to the current implementation where you know nothing has been sent. One case that could be troublesome is where one of the messages is even too large to send by itself: part of the batch could be sent without a problem (everything up to this message), but this message will not make it through. It's pretty easy to first loop through the messages and throw an error if a single message is too large, though. But that requires the implementation to know the max size upfront.

Additional context Other implementations like the client for Go and Java do automatic batching behind the scenes. The Go client uses a channel, the Java client returns a Future and continues processing asynchronously.

Automatic batching would also be nice for the javascript client. I had a look at the implementation and saw that it first groups the messages by topic and partition (which happen to be the same for the whole batch in our use case), so implementing it at that level seems preferred to my proposal above.

jagij avatar Nov 10 '20 12:11 jagij

An async producer has a lot of challenges. What should we do if you decide to terminate the service and the background queue is full? What happens if the broker is updated accept a smaller size while we already have messages in the queue? etc. We chose to yield this responsibility back to userland, that's why you can await for producer send and be sure that by the time it resolves, the messages are in Kafka.

But I like the idea of fetching the message size and validating the batch. We could do this together with the metadata refresh calls to keep the value up to date. Maybe you lack the necessary "tools" in the library to implement a good async producer. I am not against a built-in background producer, it is just that it comes with more decisions and corner cases, but I would love to see a proposal and discuss that.

tulios avatar Nov 10 '20 12:11 tulios

Thanks for the quick reply!

I guess it's easy to do async in javascript with a bunch of promises and then awaiting them. But that wouldn't take advantage of the batching: each message would be sent in a separate communication towards the right kafka broker, instead of them being bundled.

I feel like you could hide the batching in a way that just sends the first message (from the first send call), and while it's still sending, queue up the messages that are being sent in the mean time. When that previous message (or batch) has finished, it can send the currently accumulated batch of messages. That would hide the complexity to the user. Of course there are still race conditions to consider when the size of a single message becomes too large due to updated settings.

I don't think that changing those size settings to a smaller value would ever be a good idea though. I think I read something about that in the Kafka docs.

I'm curious how other implementations work when those size settings are updated in flight. Interesting.

jagij avatar Nov 10 '20 13:11 jagij

A related question with this, is there a way to actually get the maximum message size with kafkajs? I am actually trying to implement what jagij suggested and split my message into batches with a limit size of X bytes based on kafka message size configuration.

alolis avatar Mar 12 '21 18:03 alolis

relatedly, is there a way to set max.request.size on the producer?

timc13 avatar Dec 17 '21 17:12 timc13

relatedly, is there a way to set max.request.size on the producer?

Has a solution been found yet?

tgbv avatar Apr 17 '23 07:04 tgbv

relatedly, is there a way to set max.request.size on the producer?

i´m currently looking for an answer for this also

Juancho997 avatar Jun 14 '23 17:06 Juancho997