kop icon indicating copy to clipboard operation
kop copied to clipboard

[BUG] KOP Topic Partitions are assigned to the same Brokers and are not Balanced. Latency is Large.

Open dave2wave opened this issue 3 years ago • 3 comments

Describe the bug We are using OpenMessaging/Benchmark to test KOP on a Pulsar Cluster. The topics are created through Kafka. They are not equally distributed across the brokers and when partitions > 1 two partitions are on the same broker. Publishing latency is also very high with multiple partitions per topic.

To Reproduce Steps to reproduce the behavior:

  1. OMB is built with Kafka 2.8.1.
  2. Deploy (with stats) a Pulsar Cluster with Stats that has KOP configured using @eolivelli packaging of 2.8.0 versions.
  3. Run a Benchmark with the following driver:
name: Kafka
driverClass: io.openmessaging.benchmark.driver.kafka.KafkaBenchmarkDriver

# Kafka client-specific configuration
replicationFactor: 3

topicConfig: |
  min.insync.replicas=2

commonConfig: |
  bootstrap.servers=localhost:9092

producerConfig: |
  acks=all
  linger.ms=0
  batch.size=131072

consumerConfig: |
  auto.offset.reset=earliest
  enable.auto.commit=false
  max.partition.fetch.bytes=10485760
  fetch.min.bytes=0
  fetch.max.wait.ms=2
  max.poll.records=10000

and Workload:

name: 1m-10-topics-3-partitions-100b
topics: 10
partitionsPerTopic: 3
messageSize: 100
useRandomizedPayloads: true
randomBytesRatio: 0.5
randomizedPayloadPoolSize: 1000
subscriptionsPerTopic: 1
consumerPerSubscription: 1
producersPerTopic: 1
producerRate: 1000000
consumerBacklogSizeGB: 0
testDurationMinutes: 15
  1. In the output you see that consumption lags production building a backlog with very high publish latency. (Does the KOP Protocol adaptor do work that a Pulsar client would normally do?)
  2. In prometheus I look at the pulsar_rate_out which shows how the topic partitions are allocated. Here is an example:
instance="10.0.0.71:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-rEiG-p0-0008-partition-1"}
instance="10.0.0.71:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-rEiG-p0-0008-partition-0"}
instance="10.0.0.71:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-pHSsLME-0006-partition-2"}
instance="10.0.0.71:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-oCQyzso-0000-partition-2"}
instance="10.0.0.71:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-iBD6BV0-0002-partition-2"}
instance="10.0.0.71:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-Ydf-IDw-0005-partition-1"}
instance="10.0.0.71:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-WQmNrUc-0009-partition-2"}
instance="10.0.0.71:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-TRlpq58-0007-partition-2"}
instance="10.0.0.71:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-Ih8ENuE-0001-partition-1"}
----
instance="10.0.0.238:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-pHSsLME-0006-partition-1"}
instance="10.0.0.238:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-Ydf-IDw-0005-partition-2"}
instance="10.0.0.238:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-Ih8ENuE-0001-partition-0"}
----
instance="10.0.0.203:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-rEiG-p0-0008-partition-2"}
instance="10.0.0.203:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-oCQyzso-0000-partition-1"}
instance="10.0.0.203:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-iBD6BV0-0002-partition-1"}
instance="10.0.0.203:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-WQmNrUc-0009-partition-0"}
instance="10.0.0.203:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-V4JHFxw-0004-partition-2"}
instance="10.0.0.203:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-V4JHFxw-0004-partition-1"}
instance="10.0.0.203:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-TRlpq58-0007-partition-1"}
instance="10.0.0.203:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-Ih8ENuE-0001-partition-2"}
instance="10.0.0.203:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-IJjnPBc-0003-partition-0"}
----
instance="10.0.0.174:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-oCQyzso-0000-partition-0"}
instance="10.0.0.174:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-iBD6BV0-0002-partition-0"}
instance="10.0.0.174:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-Ydf-IDw-0005-partition-0"}
instance="10.0.0.174:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-WQmNrUc-0009-partition-1"}
----
instance="10.0.0.134:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-pHSsLME-0006-partition-0"}
instance="10.0.0.134:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-V4JHFxw-0004-partition-0"}
instance="10.0.0.134:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-TRlpq58-0007-partition-0"}
instance="10.0.0.134:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-IJjnPBc-0003-partition-2"}
instance="10.0.0.134:8080",job="pulsar",namespace="public/default",topic="persistent://public/default/test-topic-IJjnPBc-0003-partition-1"}

Expected behavior

  1. Each partition for a topic should be on a different broker, however we are getting 2 of 3 of a Topic's Partitions on the same Broker very often. This should only happen if the number of brokers < number of partitions.
  2. Not counting the offset and transaction partitions each broker should have the same number of partitions for Kop topics (+- 1).
  3. (Bonus) Publishing latency improves with more partitions.

Screenshots A good result with 1 partition looks like this if and only if the topics are nearly balanced:

23:21:18.929 [main] INFO - --------------- WORKLOAD : 1m-10-topics-1-partition-100b --- DRIVER : Kafka---------------
23:21:19.351 [main] INFO - Created 10 topics in 123.565828 ms
23:21:19.368 [main] INFO - Created 10 consumers in 12.374813 ms
23:21:19.372 [main] INFO - Created 10 producers in 3.550182 ms
23:21:19.372 [main] INFO - Waiting for consumers to be ready
23:21:22.704 [main] INFO - All consumers are ready
23:21:22.708 [main] INFO - ----- Starting warm-up traffic ------
23:21:33.109 [main] INFO - Pub rate 1063100.2 msg/s / 101.4 MB/s | Cons rate 1062662.0 msg/s / 101.3 MB/s | Backlog:  4.5 K | Pub Latency (ms) avg: 33.8 - 50%:  5.8 - 99%: 523.1 - 99.9%: 608.2 - Max: 627.9
23:21:43.334 [main] INFO - Pub rate 1014505.2 msg/s / 96.8 MB/s | Cons rate 1014528.6 msg/s / 96.8 MB/s | Backlog:  4.3 K | Pub Latency (ms) avg:  5.5 - 50%:  4.9 - 99%: 15.7 - 99.9%: 28.7 - Max: 46.5
23:21:53.504 [main] INFO - Pub rate 1002987.3 msg/s / 95.7 MB/s | Cons rate 1003060.6 msg/s / 95.7 MB/s | Backlog:  3.6 K | Pub Latency (ms) avg:  5.4 - 50%:  4.9 - 99%: 15.3 - 99.9%: 28.1 - Max: 41.9
23:22:03.663 [main] INFO - Pub rate 1001157.3 msg/s / 95.5 MB/s | Cons rate 1001164.7 msg/s / 95.5 MB/s | Backlog:  3.5 K | Pub Latency (ms) avg:  5.5 - 50%:  4.9 - 99%: 16.1 - 99.9%: 31.2 - Max: 42.2
23:22:13.829 [main] INFO - Pub rate 999496.1 msg/s / 95.3 MB/s | Cons rate 999420.6 msg/s / 95.3 MB/s | Backlog:  4.2 K | Pub Latency (ms) avg:  5.4 - 50%:  4.9 - 99%: 15.6 - 99.9%: 28.2 - Max: 36.7
23:22:23.987 [main] INFO - Pub rate 1000505.5 msg/s / 95.4 MB/s | Cons rate 1000552.0 msg/s / 95.4 MB/s | Backlog:  3.8 K | Pub Latency (ms) avg:  5.5 - 50%:  4.9 - 99%: 16.1 - 99.9%: 31.9 - Max: 40.6
23:22:24.204 [main] INFO - ----- Aggregated Pub Latency (ms) avg: 10.5 - 50%:  5.0 - 95%: 13.3 - 99%: 172.9 - 99.9%: 562.9 - 99.99%: 619.3 - Max: 627.9
23:22:24.290 [main] INFO - ----- Starting benchmark traffic ------
23:22:34.445 [main] INFO - Pub rate 1030385.7 msg/s / 98.3 MB/s | Cons rate 1030183.5 msg/s / 98.2 MB/s | Backlog:  5.8 K | Pub Latency (ms) avg:  7.3 - 50%:  5.3 - 99%: 42.8 - 99.9%: 82.4 - Max: 106.2
23:22:44.605 [main] INFO - Pub rate 999509.1 msg/s / 95.3 MB/s | Cons rate 999570.4 msg/s / 95.3 MB/s | Backlog:  5.2 K | Pub Latency (ms) avg:  5.5 - 50%:  4.9 - 99%: 15.8 - 99.9%: 31.8 - Max: 49.8
23:22:54.754 [main] INFO - Pub rate 1000977.5 msg/s / 95.5 MB/s | Cons rate 1001240.5 msg/s / 95.5 MB/s | Backlog:  2.5 K | Pub Latency (ms) avg:  5.5 - 50%:  4.9 - 99%: 15.4 - 99.9%: 33.1 - Max: 45.4
23:23:04.909 [main] INFO - Pub rate 999338.9 msg/s / 95.3 MB/s | Cons rate 999250.7 msg/s / 95.3 MB/s | Backlog:  3.4 K | Pub Latency (ms) avg:  5.5 - 50%:  4.9 - 99%: 15.4 - 99.9%: 38.4 - Max: 56.6
23:23:15.071 [main] INFO - Pub rate 1000088.5 msg/s / 95.4 MB/s | Cons rate 1000156.3 msg/s / 95.4 MB/s | Backlog:  2.7 K | Pub Latency (ms) avg:  5.6 - 50%:  5.0 - 99%: 15.3 - 99.9%: 26.7 - Max: 41.0
23:23:25.216 [main] INFO - Pub rate 1001035.0 msg/s / 95.5 MB/s | Cons rate 1000762.9 msg/s / 95.4 MB/s | Backlog:  5.5 K | Pub Latency (ms) avg:  5.5 - 50%:  4.9 - 99%: 14.1 - 99.9%: 28.2 - Max: 37.1

A bad result for 3 partitions looks like:

23:03:57.616 [main] INFO - --------------- WORKLOAD : 1m-10-topics-3-partitions-100b --- DRIVER : Kafka---------------
23:03:58.593 [main] INFO - Created 10 topics in 519.283047 ms
23:03:58.660 [main] INFO - Created 10 consumers in 63.300584 ms
23:03:58.677 [main] INFO - Created 10 producers in 16.174427 ms
23:03:58.677 [main] INFO - Waiting for consumers to be ready
23:04:02.647 [main] INFO - All consumers are ready
23:04:02.651 [main] INFO - ----- Starting warm-up traffic ------
23:04:13.300 [main] INFO - Pub rate 928095.6 msg/s / 88.5 MB/s | Cons rate 874004.9 msg/s / 83.4 MB/s | Backlog: 568.7 K | Pub Latency (ms) avg: 676.7 - 50%: 449.3 - 99%: 3123.6 - 99.9%: 4366.4 - Max: 4483.7
23:04:23.586 [main] INFO - Pub rate 844131.3 msg/s / 80.5 MB/s | Cons rate 805962.6 msg/s / 76.9 MB/s | Backlog: 963.4 K | Pub Latency (ms) avg: 1337.3 - 50%: 331.2 - 99%: 5931.1 - 99.9%: 6162.2 - Max: 6197.2
23:04:33.819 [main] INFO - Pub rate 829710.7 msg/s / 79.1 MB/s | Cons rate 753724.3 msg/s / 71.9 MB/s | Backlog: 1740.6 K | Pub Latency (ms) avg: 1464.8 - 50%: 133.0 - 99%: 5919.1 - 99.9%: 6021.0 - Max: 6069.7
23:04:44.047 [main] INFO - Pub rate 855410.0 msg/s / 81.6 MB/s | Cons rate 740713.1 msg/s / 70.6 MB/s | Backlog: 2914.1 K | Pub Latency (ms) avg: 1464.3 - 50%: 167.4 - 99%: 6095.1 - 99.9%: 6157.5 - Max: 6191.2
23:04:54.289 [main] INFO - Pub rate 803753.6 msg/s / 76.7 MB/s | Cons rate 764182.8 msg/s / 72.9 MB/s | Backlog: 3319.4 K | Pub Latency (ms) avg: 1464.7 - 50%: 151.6 - 99%: 6195.9 - 99.9%: 6489.1 - Max: 6557.6
23:05:04.518 [main] INFO - Pub rate 848135.9 msg/s / 80.9 MB/s | Cons rate 757874.7 msg/s / 72.3 MB/s | Backlog: 4242.5 K | Pub Latency (ms) avg: 1428.5 - 50%: 129.1 - 99%: 5933.1 - 99.9%: 6007.0 - Max: 6038.5
23:05:04.859 [main] INFO - ----- Aggregated Pub Latency (ms) avg: 1291.8 - 50%: 235.9 - 95%: 5451.1 - 99%: 6005.4 - 99.9%: 6374.1 - 99.99%: 6495.0 - Max: 6557.6
23:05:04.988 [main] INFO - ----- Starting benchmark traffic ------
23:05:15.198 [main] INFO - Pub rate 887500.6 msg/s / 84.6 MB/s | Cons rate 811810.8 msg/s / 77.4 MB/s | Backlog: 5014.5 K | Pub Latency (ms) avg: 1440.3 - 50%: 122.8 - 99%: 5836.4 - 99.9%: 5934.8 - Max: 5971.5
23:05:25.414 [main] INFO - Pub rate 848118.0 msg/s / 80.9 MB/s | Cons rate 751744.3 msg/s / 71.7 MB/s | Backlog: 5999.0 K | Pub Latency (ms) avg: 1425.3 - 50%: 119.8 - 99%: 5878.2 - 99.9%: 5955.1 - Max: 5999.8
23:05:35.646 [main] INFO - Pub rate 840792.6 msg/s / 80.2 MB/s | Cons rate 762692.5 msg/s / 72.7 MB/s | Backlog: 6797.7 K | Pub Latency (ms) avg: 1444.3 - 50%: 109.7 - 99%: 5928.1 - 99.9%: 5979.8 - Max: 6004.2
23:05:45.867 [main] INFO - Pub rate 844616.8 msg/s / 80.5 MB/s | Cons rate 763827.8 msg/s / 72.8 MB/s | Backlog: 7623.9 K | Pub Latency (ms) avg: 1432.4 - 50%: 110.3 - 99%: 5937.7 - 99.9%: 5988.7 - Max: 6037.6
23:05:56.108 [main] INFO - Pub rate 834191.7 msg/s / 79.6 MB/s | Cons rate 747615.3 msg/s / 71.3 MB/s | Backlog: 8510.5 K | Pub Latency (ms) avg: 1457.2 - 50%: 125.0 - 99%: 6040.9 - 99.9%: 6231.4 - Max: 6288.2
23:06:06.315 [main] INFO - Pub rate 841145.0 msg/s / 80.2 MB/s | Cons rate 762573.9 msg/s / 72.7 MB/s | Backlog: 9312.3 K | Pub Latency (ms) avg: 1461.2 - 50%: 130.5 - 99%: 5996.7 - 99.9%: 6044.7 - Max: 6065.0

Additional context The AWS cluster consisted of the following:

public_key_path = "~/.ssh/pulsar_aws.pub"
region          = "us-west-2"
ami             = "ami-9fa343e7" // RHEL-7.4

instance_types = {
  "pulsar"      = "i3.4xlarge"
  "zookeeper"   = "t3.small"
  "client"      = "c5.2xlarge"
  "prometheus"  = "t3.small"
}

num_instances = {
  "client"      = 15
  "pulsar"      = 5
  "zookeeper"   = 3
  "prometheus"  = 1
}

dave2wave avatar Dec 07 '21 17:12 dave2wave

FYI - The fix for some of this behavior may be in https://github.com/apache/pulsar/pull/13069

I'll know more Thursday.

Thanks.

dave2wave avatar Dec 07 '21 17:12 dave2wave

FYI - The fix for some of this behavior may be in apache/pulsar#13069

I'll know more Thursday.

Nope

dave2wave avatar Jan 03 '22 22:01 dave2wave

I've tested in various ways.

  • Producing with Pulsar clients and consuming with Kafka clients works well.
  • Producing with Kafka clients and consuming with Pulsar clients works well.
  • Producing and consuming with Kafka clients at high rates causes contention in the following broker thread:
"BookKeeperClientWorker-OrderedExecutor-0-0" #72 prio=5 os_prio=0 cpu=1817945.45ms elapsed=7066.77s tid=0x00007f5fc1b509a0 nid=0x3770 runnable  [0x00007f5ea64e2000]

dave2wave avatar Jan 03 '22 22:01 dave2wave