pulsar
pulsar copied to clipboard
Possible race condition: ReaderTest.testReadMessageWithBatchingWithMessageInclusive:148 expected [true] but found [false]
Describe the bug
I'm able to reproduce this issue locally but not consistently. It doesn't seem to behave like the other flaky tests that occur only in Github CI due to timeouts.
I think there might be a race condition because reader.hasMessageAvailable()
is evaluating to false when it should be true.
To Reproduce Run the test several times.
When the test passes, this call in ConsumerImpl.hasMessageAvailable():
hasMoreMessages(lastMessageIdInBroker, lastDequeuedMessage)
evaluates to true.
When the test fails, that call evaluates to false.
The values of lastDequeuedMessage are the same between passing vs failing tests. The values of MessageId.latest are also the same between passing vs failing tests.
In both passing and failing cases, they look like this:
So, the behavior must be different in ConsumerImpl.hasMoreMessages(..)
between the execution of passing vs failing tests.
In a passing test, in:
private boolean hasMoreMessages(MessageId lastMessageIdInBroker, MessageId lastDequeuedMessage) {
if (lastMessageIdInBroker.compareTo(lastDequeuedMessage) > 0 &&
((MessageIdImpl)lastMessageIdInBroker).getEntryId() != -1) {
return true;
} else {
// Make sure batching message can be read completely.
return lastMessageIdInBroker.compareTo(lastDequeuedMessage) == 0
&& incomingMessages.size() > 0;
}
}
incomingMessages.size() has the expected value, and lastMessageIdInBroker.compareTo(lastDequeuedMessage) == 0 evaluates to true.
In the failing test, incomingMessages.size()
in hasMoreMessages(..)
evaluates to 0 instead of the expected value.
I have a surefire log output with this issue here: org.apache.pulsar.client.impl.ReaderTest-output.txt
I created a repro for an issue that is flaky, but fails in most cases. This happens on Pulsar 2.5.0, but not on Pulsar 2.4.2. . The description and repro instructions are at #6333
@nodece Could you please help check if this one is related to https://github.com/apache/pulsar/pull/15568
Closed as stale. If it's still flaky test on master, please open a new issue with the flaky test template.