kafkajs icon indicating copy to clipboard operation
kafkajs copied to clipboard

'publish' performance optimization on high parallelism, prevent lock if not needed

Open assaf-xm opened this issue 3 years ago • 9 comments

Every message 'publish' goes through 'addMultipleTargetTopics' which always takes an async lock.

The lock becomes slow on high publish parallelism (few thousands waiters or more) and could cause errors like: KafkaJSLockTimeout: Timeout while acquiring lock (2162 waiting locks): "updating target topics"

This PR prevents taking the lock and increases the throughput by ~20% for medium loads and by more for high loads. As well as reducing the need to increase requestTimeout on high parallelism.

assaf-xm avatar Sep 14 '21 22:09 assaf-xm

@tulios , need to fix few tests that check the amount of calls to 'refreshMetadata' which is now reduced. Does this change make sense?

assaf-xm avatar Sep 14 '21 23:09 assaf-xm

@assaf-xm I think you should wait on this until we know what's happening with https://github.com/tulios/kafkajs/pull/667 .

t-d-d avatar Sep 18 '21 07:09 t-d-d

@t-d-d , if 'addMultipleTargetTopics' will be removed, yes, there is no need for this PR, but I see that #667 is in code review for months.

assaf-xm avatar Sep 26 '21 05:09 assaf-xm

@t-d-d Any updates here? I have faced described issue in production with a producing rate of 6k/sec.

suvorovis avatar Oct 08 '21 15:10 suvorovis

@t-d-d , I don't see any progress with #667

assaf-xm avatar Nov 22 '21 07:11 assaf-xm

Facing similar issue.

0|ludo-ws-s1 | KafkaJSNonRetriableError: Timeout while acquiring lock (2 waiting locks): "updating target topics" 0|ludo-ws-s1 | at /home/ec2-user/zupee-ludo/ludo.service/node_modules/commons/node_modules/kafkajs/src/retry/index.js:55:18 { 0|ludo-ws-s1 | name: 'KafkaJSNonRetriableError', 0|ludo-ws-s1 | retriable: false, 0|ludo-ws-s1 | helpUrl: undefined, 0|ludo-ws-s1 | cause: KafkaJSLockTimeout: Timeout while acquiring lock (2 waiting locks): "updating target topics" 0|ludo-ws-s1 | at Timeout. (/home/ec2-user/zupee-ludo/ludo.service/node_modules/commons/node_modules/kafkajs/src/utils/lock.js:48:23) 0|ludo-ws-s1 | at Timeout.wrapped [as _onTimeout] (/home/ec2-user/zupee-ludo/ludo.service/node_modules/wtfnode/index.js:197:27) 0|ludo-ws-s1 | at listOnTimeout (internal/timers.js:549:17) 0|ludo-ws-s1 | at processTimers (internal/timers.js:492:7) { 0|ludo-ws-s1 | name: 'KafkaJSLockTimeout', 0|ludo-ws-s1 | retriable: false, 0|ludo-ws-s1 | helpUrl: undefined, 0|ludo-ws-s1 | cause: undefined 0|ludo-ws-s1 | } 0|ludo-ws-s1 | } >>>>>>>>>>>>>>>> 22 Unhandled Rejection at Promise >>>>>>>>>>>>>>>> Promise { 0|ludo-ws-s1 | KafkaJSNonRetriableError: Timeout while acquiring lock (2 waiting locks): "updating target topics" 0|ludo-ws-s1 | at /home/ec2-user/zupee-ludo/ludo.service/node_modules/commons/node_modules/kafkajs/src/retry/index.js:55:18 { 0|ludo-ws-s1 | name: 'KafkaJSNonRetriableError', 0|ludo-ws-s1 | retriable: false, 0|ludo-ws-s1 | helpUrl: undefined, 0|ludo-ws-s1 | cause: KafkaJSLockTimeout: Timeout while acquiring lock (2 waiting locks): "updating target topics" 0|ludo-ws-s1 | at Timeout. (/home/ec2-user/zupee-ludo/ludo.service/node_modules/commons/node_modules/kafkajs/src/utils/lock.js:48:23) 0|ludo-ws-s1 | at Timeout.wrapped [as _onTimeout] (/home/ec2-user/zupee-ludo/ludo.service/node_modules/wtfnode/index.js:197:27) 0|ludo-ws-s1 | at listOnTimeout (internal/timers.js:549:17) 0|ludo-ws-s1 | at processTimers (internal/timers.js:492:7) { 0|ludo-ws-s1 | name: 'KafkaJSLockTimeout', 0|ludo-ws-s1 | retriable: false, 0|ludo-ws-s1 | helpUrl: undefined, 0|ludo-ws-s1 | cause: undefined 0|ludo-ws-s1 | } 0|ludo-ws-s1 | } 0|ludo-ws-s1 | } 0|ludo-ws-s1 | This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). The promise rejected with the reason: 0|ludo-ws-s1 | KafkaJSNonRetriableError: Timeout while acquiring lock (2 waiting locks): "updating target topics" 0|ludo-ws-s1 | at /home/ec2-user/zupee-ludo/ludo.service/node_modules/commons/node_modules/kafkajs/src/retry/index.js:55:18

atiquefiroz avatar Aug 25 '23 08:08 atiquefiroz

I am facing this issue also during my high scale performance testing to simulate our production scenario. Without forking this repo and making this simple code change we can't use this library. Please merge it in!

emorneau avatar Apr 17 '24 19:04 emorneau

I'm facing the exact same issue. Could these changes be merged? I saw #667 is closed

delarosaj avatar Apr 17 '24 20:04 delarosaj