pulsar
pulsar copied to clipboard
[fix][client] Fix the blocked producer due to chunking when blockIfQueueFull is enabled
Fixes #17446
Motivation
Producer may be permanently blocked by chunking messages when blockIfQueueFull is enabled.
The reason for this bug is how the chunk message semaphore is acquired.
https://github.com/apache/pulsar/blob/359cfa7bc05775bf6dd004f21b9907610ed3b3d5/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java#L520-L527
When a large message is split into a large number of chunks (i.e. the message is too big or the chunkMaxMessageSize
is set too small), all the remaining semaphores will be acquired. The sending (send() and sendAsync()) of large message will be blocked by itself forever.
By the way, once blockIfQueueFull/maxPendingMessages/chunking are enabled at the same time, this risk of deadlock exists even if the number of chunks of a single message is not very large.
Modifications
When chunking is enabled, the blockIfQueueFull is always disabled.
Verifying this change
- [x] Make sure that the change passes the CI checks.
This change added tests and can be verified as follows:
- Added integration tests for end-to-end deployment with large payloads (10MB)
Does this pull request potentially affect one of the following parts:
If yes
was chosen, please highlight the changes
- Dependencies (does it add or upgrade a dependency): (yes / no)
- The public API: (yes / no)
- The schema: (yes / no / don't know)
- The default values of configurations: (yes / no)
- The wire protocol: (yes / no)
- The rest endpoints: (yes / no)
- The admin cli options: (yes / no)
- Anything that affects deployment: (yes / no / don't know)
Documentation
Check the box below or label this PR directly.
Need to update docs?
-
[ ]
doc-required
(Your PR needs to update docs and you will update later) -
[x]
doc-not-needed
(Please explain why) -
[ ]
doc
(Your PR contains doc changes) -
[ ]
doc-complete
(Docs have been already added)
Does this happen in real world scenario? even if a chunk message take all remaining send permits, sending for earlier messages will be completed and release permits, unless one big chunk message will take up all available permits, in such case more like a configuration issue, should increase maxPendingMessages count?
Does this happen in real world scenario? even if a chunk message take all remaining send permits, sending for earlier messages will be completed and release permits, unless one big chunk message will take up all available permits, in such case more like a configuration issue, should increase maxPendingMessages count?
Hi @MarvinCai. It is not only happens in one big chunk message. If a client is sending big chunking messages concurrently, it's more easy to take all remaining send permits and no more permits can be released. The most important point is that the sending of the chunk may acquire permits that it cannot release, risk remains even with increased maxPendingMessages
.
The pr had no activity for 30 days, mark with Stale label.