n8n icon indicating copy to clipboard operation
n8n copied to clipboard

Kafka trigger poor performance in queue modo

Open mdbetancourt opened this issue 1 year ago • 3 comments

Describe the bug I'm getting a poor performance on n8n even with queue mode because triggers are only processed by the main proccess

To Reproduce Steps to reproduce the behavior:

  1. run sudo docker run --env-file=.env --rm --name=n8n-main -p 80:5678 registry.gitlab.com/nepuntobiz/nemobile/n8n:develop
  2. run sudo docker run --env-file=.env -d --rm --name=n8n-webhook -p 5678:5678 registry.gitlab.com/nepuntobiz/nemobile/n8n:develop n8n webhook
  3. run sudo docker run --env-file=.env -d --rm --name=n8n-worker registry.gitlab.com/nepuntobiz/nemobile/n8n:develop n8n worker --concurrency=32
  4. add a kafka node and push to kafka 50k records

Expected behavior Spread the task over the 32 worker process instead of running just in main (where is the UI)

mdbetancourt avatar Sep 09 '22 01:09 mdbetancourt

Hello @mdbetancourt sorry for the performance issues.

There are actually 2 stages for the Kafka trigger. Considering it requires a persistent connection to Kafka, it means that actually all 50k executions would be started from n8n's main process - this is a limitation to "non-HTTP" related triggers in n8n.

The main process is the only one that can spark those executions. Once started, they should be handled to worker processes for the remainder of the execution (all other nodes) and then return back to the main process for wrap up.

Is this the behavior you are having? If so, unfortunately this is by design and we know this is a scaling issue at the moment that we are lookinig to address in the future.

krynble avatar Sep 09 '22 10:09 krynble

@krynble yes it is, this issue even make the main process to crash and hang up,

after a while i get this

execution time: 2381
query is slow: INSERT INTO "public"."execution_entity
.....
....
<--- Last few GCs --->

[7:0x7f0db01553e0]   298278 ms: Mark-sweep 4033.4 (4138.7) -> 4016.2 (4137.2) MB, 2132.1 / 0.2 ms  (average mu = 0.165, current mu = 0.125) task scavenge might not succeed
[7:0x7f0db01553e0]   300791 ms: Mark-sweep 4032.4 (4137.2) -> 4018.1 (4138.9) MB, 2332.5 / 0.1 ms  (average mu = 0.120, current mu = 0.072) allocation failure scavenge might not succeed


<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

mdbetancourt avatar Sep 09 '22 17:09 mdbetancourt

Ah yes, I am sorry to hear that. You can temporarily tune some of the settings as part of the Kafka Trigger node, such as the maximum number of requests as shown below

image

This might help improve reliability at the cost of reducing throughput

krynble avatar Sep 14 '22 08:09 krynble