kafkajs
kafkajs copied to clipboard
'publish' performance optimization on high parallelism, prevent lock if not needed
Every message 'publish' goes through 'addMultipleTargetTopics' which always takes an async lock.
The lock becomes slow on high publish parallelism (few thousands waiters or more) and could cause errors like: KafkaJSLockTimeout: Timeout while acquiring lock (2162 waiting locks): "updating target topics"
This PR prevents taking the lock and increases the throughput by ~20% for medium loads and by more for high loads. As well as reducing the need to increase requestTimeout on high parallelism.
@tulios , need to fix few tests that check the amount of calls to 'refreshMetadata' which is now reduced. Does this change make sense?
@assaf-xm I think you should wait on this until we know what's happening with https://github.com/tulios/kafkajs/pull/667 .
@t-d-d , if 'addMultipleTargetTopics' will be removed, yes, there is no need for this PR, but I see that #667 is in code review for months.
@t-d-d Any updates here? I have faced described issue in production with a producing rate of 6k/sec.
@t-d-d , I don't see any progress with #667
Facing similar issue.
0|ludo-ws-s1 | KafkaJSNonRetriableError: Timeout while acquiring lock (2 waiting locks): "updating target topics"
0|ludo-ws-s1 | at /home/ec2-user/zupee-ludo/ludo.service/node_modules/commons/node_modules/kafkajs/src/retry/index.js:55:18 {
0|ludo-ws-s1 | name: 'KafkaJSNonRetriableError',
0|ludo-ws-s1 | retriable: false,
0|ludo-ws-s1 | helpUrl: undefined,
0|ludo-ws-s1 | cause: KafkaJSLockTimeout: Timeout while acquiring lock (2 waiting locks): "updating target topics"
0|ludo-ws-s1 | at Timeout.
I am facing this issue also during my high scale performance testing to simulate our production scenario. Without forking this repo and making this simple code change we can't use this library. Please merge it in!
I'm facing the exact same issue. Could these changes be merged? I saw #667 is closed