confluent-kafka-python icon indicating copy to clipboard operation
confluent-kafka-python copied to clipboard

why does Confluent-kafka does not send more than 5k messages with 1MB of payload ?

Open AkshayAwate opened this issue 4 years ago • 11 comments

Description

I am doing some tests, first i sent 5 messages with payload of 1MB, then 50, 500 with same payload, it works well. But when i send 5k messages it throws error as :


`%5|1617340039.942|REQTMOUT|rdkafka#producer-1| [thrd:sasl_plaintext://52.149.147.197:32266/2]: sasl_plaintext://52.149.147.197:32266/2: Timed out ProduceRequest in flight (after 950ms, timeout #0)
%4|1617340039.942|REQTMOUT|rdkafka#producer-1| [thrd:sasl_plaintext://52.149.147.197:32266/2]: sasl_plaintext://52.149.147.197:32266/2: Timed out 1 in-flight, 0 retry-queued, 1 out-queue, 1 partially-sent requests
%3|1617340039.942|FAIL|rdkafka#producer-1| [thrd:sasl_plaintext://52.149.147.197:32266/2]: sasl_plaintext://52.149.147.197:32266/2: 2 request(s) timed out: disconnect (after 302251ms in state UP)
%5|1617340040.949|REQTMOUT|rdkafka#producer-1| [thrd:sasl_plaintext://52.149.147.197:32266/2]: sasl_plaintext://52.149.147.197:32266/2: Timed out ProduceRequest in flight (after 979ms, timeout #0)
%4|1617340040.949|REQTMOUT|rdkafka#producer-1| [thrd:sasl_plaintext://52.149.147.197:32266/2]: sasl_plaintext://52.149.147.197:32266/2: Timed out 1 in-flight, 0 retry-queued, 1 out-queue, 1 partially-sent requests
%3|1617340040.949|FAIL|rdkafka#producer-1| [thrd:sasl_plaintext://52.149.147.197:32266/2]: sasl_plaintext://52.149.147.197:32266/2: 2 request(s) timed out: disconnect (after 1001ms in state UP, 1 identical error(s) suppressed)
Processed 5000 messsages in 303.98 seconds
15.69 MB/s
16.45 Msgs/s
%4|1617340041.663|TERMINATE|rdkafka#producer-1| [thrd:app]: Producer terminating with 897 messages (1100181264 bytes) still in queue or transit: use flush() to wait for outstanding message delivery`

My configs:

`p = Producer({'bootstrap.servers': '10.x.x.x:19092',
    'sasl.username': kafka_user, 'compression.codec':'snappy',
    'sasl.password': kafka_password, 'sasl.mechanisms':'PLAIN', 'security.protocol': 'SASL_PLAINTEXT',
    'message.max.bytes':'1000000000', 'queue.buffering.max.messages': '10000000', 'message.max.bytes' :'1000000000',
    'queue.buffering.max.kbytes': '2147483647', 'queue.buffering.max.ms' : '500', 'queue.buffering.max.messages':'10000000'})`

I have read max message payload can be 1MB, how to proceed for larger payload ? is there anything Iam missing ?

How to reproduce

Checklist

Please provide the following information:

  • [ ] confluent-kafka-python and librdkafka version (confluent_kafka.version() and confluent_kafka.libversion()): latest
  • [ ] Apache Kafka broker version:
  • [ ] Client configuration: {...}
  • [ ] Operating system: ubuntu 18.02
  • [ ] Provide client logs (with 'debug': '..' as necessary)
  • [ ] Provide broker log excerpts
  • [ ] Critical issue

AkshayAwate avatar Apr 02 '21 05:04 AkshayAwate

It seems odd that the ProduceRequest are timing out after only one second, are you sure that you are not setting request.timeout.ms or message.timeout.ms?

edenhill avatar Apr 06 '21 13:04 edenhill

@edenhill NO, I am not using request.timeout.ms or message.timeout.ms in my configs.

AkshayAwate avatar Apr 06 '21 14:04 AkshayAwate

Ah, I think I see what is going on. Your produce rate is too high for the network/cluster causing messages to be queued in the client and when they're eventually transmitted their timeout might be so low that the message times out while in flight to the broker. Your run lasts for 303 seconds, the default message.timeout.ms is 300s, so that sort of makes sense.

If you reduce the producer queue size you will get quicker back pressure (produce() will raise QUEUE_FULL) and you can stop producing until there is room in the queue.

edenhill avatar Apr 06 '21 14:04 edenhill

@edenhill okay, i will try with linger_ms=5 ?

AkshayAwate avatar Apr 06 '21 14:04 AkshayAwate

No, rather limit queue.buffering.max.kbytes and queue.buffering.max.messages to only allow for say 60 seconds worth of messages. e.g., if your input rate is 1000 messages per second, set queue.buffering.max.messages to 60000.

edenhill avatar Apr 06 '21 14:04 edenhill

@edenhill I will try and update.

AkshayAwate avatar Apr 06 '21 14:04 AkshayAwate

@edenhill so main thing is i am using image bytes as payload.

AkshayAwate avatar Apr 06 '21 14:04 AkshayAwate

Just wondering was the issue resolved, because I am facing something similar and wanted to ask if you ever found a resolution to this issue, thanks!

Ypsingh18 avatar Aug 16 '21 16:08 Ypsingh18

having the same problem. was this solved?

adrianguyareach avatar Sep 09 '21 12:09 adrianguyareach

See my previous comments on setting queue sizes.

edenhill avatar Sep 09 '21 12:09 edenhill

image I have same issue.. how to solve this?

arizalpratama avatar Jan 16 '22 00:01 arizalpratama

The main question is already answered by @edenhill. Closing this issue.

pranavrth avatar Mar 12 '24 11:03 pranavrth