confluent-kafka-python icon indicating copy to clipboard operation
confluent-kafka-python copied to clipboard

SerializingProducer is much slower than Producer in Python

Open zacharydestefano89 opened this issue 3 years ago • 6 comments

Description

I was working on code to produce messages to a Kafka topic. The messages are protobuf bytes and I used SerializingProducer to pass the schema information. I tried a separate method where I imitated what was done here

It was able to produce and flush messages at a rate of about 12 messages per second. For my use case, this is way too slow.

When I just used Producer and took out any schema information, the rate suddenly jumped to ~100s of messages per second.

How to reproduce

  1. Write a job to put thousands of messages onto a Kafka topic
  2. Have the job put schema information into each message and time it
  3. Compare it to the same job that does put in schema information

Checklist

Please provide the following information:

  • [x] confluent-kafka-python and librdkafka version (confluent_kafka.version() and confluent_kafka.libversion()):

From requirements.txt with the Python library: confluent-kafka==1.7.0

From console:

>>> import confluent_kafka
>>> confluent_kafka.libversion()
('1.7.0', 17236223)
>>> confluent_kafka.version()
('1.7.0', 17235968)
>>> 
  • [x] Apache Kafka broker version: Confluent Cloud

  • [x] Client configuration: {...}

Producer config:

{'bootstrap.servers': '...',
 'error_cb': <function error_cb at 0x7fd2dc01f820>,
 'sasl.mechanism': 'PLAIN',
 'sasl.password': '***************************',
 'sasl.username': '***************',
 'security.protocol': 'SASL_SSL'}
  • [x] Operating system:

Run from docker container derived from Python 3.8.8 base

First line of Dockerfile: FROM python:3.8.8

  • [x] Provide client logs (with 'debug': '..' as necessary)

Using SerializingProducer:

INFO:root:Now adding 221 messages to Kafka topic. INFO mode will display the first and last 3 messages, DEBUG mode will display all of them
[2022-10-06, 20:05:42 UTC] {docker.py:310} INFO - INFO:root:2022-10-06T20:05:42.031972+00:00 : Adding message starting `user_i_d: "******` onto Kafka buffer under topic `***`
...
[2022-10-06, 20:06:16 UTC] {docker.py:310} INFO - INFO:root:Now flushing Kafka producer
[2022-10-06, 20:06:16 UTC] {docker.py:310} INFO - INFO:root:Time to produce and flush for chunk of 221 messages: 34.54440498352051 seconds

Using Producer:

[2022-10-06, 20:06:16 UTC] {docker.py:310} INFO - INFO:root:Now adding 54 messages to Kafka topic. INFO mode will display the first and last 3 messages, DEBUG mode will display all of them
[2022-10-06, 20:06:16 UTC] {docker.py:310} INFO - INFO:root:2022-10-06T20:06:16.675951+00:00 : Adding message starting `b'\n\****\x1` onto Kafka buffer under topic `****`
...
[2022-10-06, 20:06:16 UTC] {docker.py:310} INFO - INFO:root:Now flushing Kafka producer
[2022-10-06, 20:06:16 UTC] {docker.py:310} INFO - INFO:root:Time to produce and flush for chunk of 54 messages: 0.18948936462402344 seconds
  • [x] Critical issue: Not critical, have a workaround

zacharydestefano89 avatar Oct 06 '22 20:10 zacharydestefano89

It was able to produce and flush messages at a rate of about 12 messages per second

are you flushing after every produce? (this will be slow)

mhowlett avatar Oct 19 '22 17:10 mhowlett

It was able to produce and flush messages at a rate of about 12 messages per second

are you flushing after every produce? (this will be slow)

I tried both flushing after every produce and flushing after producing many messages. In both cases, messages were put on the topic at that aforementioned rate, 12 per second.

zacharydestefano89 avatar Oct 19 '22 18:10 zacharydestefano89

~100s messages per second.

you should be able to get 10s of thousands of messages per second without the protobuf serdes. i don't have a good feel for how performant the protobuf serdes are (and you don't say anything about the size of your messages), but 12 per second seems very low.

It doesn't seem like we have a benchmark application for Python, we should write one (marking as enhancement).

mhowlett avatar Oct 24 '22 17:10 mhowlett

I get the feeling it is doing a schema-registry lookup for each message, which would explain the low thruput. Maybe worth checking, somehow?

edenhill avatar Oct 24 '22 17:10 edenhill

I reported the unnecessary lookup in 2020 https://github.com/confluentinc/confluent-kafka-python/issues/935 It was fixed by https://github.com/confluentinc/confluent-kafka-python/pull/1133 so 1.8.2+ So i think upgrading to 1.8.2+ should fix.

CTCC1 avatar Oct 25 '22 21:10 CTCC1

Can you please confirm if it was fixed with the version upgrade?

pranavrth avatar Feb 20 '24 12:02 pranavrth