CPU spike when using eachMessage
Environment Information
- OS : node-alpine image running in ECS fargate
- Node Version: 20
- NPM Version: 20
- C++ Toolchain [e.g. Visual Studio, llvm, g++]: Visual Studio Code
- confluent-kafka-javascript version [e.g. 2.3.3]: latest
Steps to Reproduce We are currently using the confluent-kafka-js library to consume messages from an AWS MSK (Provisioned) cluster. Our implementation relies on the eachMessage handler, with the expectation that messages are consumed and processed one at a time, in order. While the behavior aligns with our expectations in terms of message ordering and processing, we are observing significant CPU spikes when consumption begins, especially when the topic contains a large number of messages.
We are trying to determine whether the library is pre-fetching or consuming messages in the background, which might be contributing to the spike. If that’s the case, is there a way to control or throttle message consumption to mitigate the CPU usage?
I have attached the sample code.
confluent-kafka-javascript Configuration Settings Basic settings. No additional setting provided.
Yes, we prefetch messages on background threads while the consumer is active and topics have messages and queue them internally. You can reduce the fetching frequency using the config property 'fetch.queue.backoff.ms' and 'queued.max.messages.kbytes'.
'queued.max.messages.kbytes' controls how many messages can be prefetched. The default value is 64MB. 'fetch.queue.backoff.ms' specifies the backoff in case the max messages threshhold is breached. Setting it to a higher value will mean you reduce CPU usage (at the cost of higher latency in some cases). It's 1s by default.
In your case, I think first you could reduce 'queued.max.messages.kbytes' and see if it reduces the usage. Further config properties: https://github.com/confluentinc/librdkafka/blob/master/CONFIGURATION.md
Is there a way to control the number of messages based on count. Since our message size is hardly 400 bytes each.
No, it cannot be lower than 1KB (it accepts integer values). If you are setting it to such a low value, you can also read the documentation around "fetch.message.max.bytes" as that is also important. Pre-fetching batches is quite important for performance reasons (and for reducing network calls since we fetch batches, not messages at once).
If there are 7000 messages in the topic and even if we have fetch.message.max.bytes it will still continously keep polling the messages, which could still cause a cpu spike rite?
No, it stops fetching a given partition when it reaches queued.max.messages.kbytes (64MB) or queued.min.messages (100000). You can reduce that value if you need. Also when there aren't new messages Kafka does long polling with a maximum wait of fetch.wait.max.ms (500) . That avoids busy loops.