rocketmq icon indicating copy to clipboard operation
rocketmq copied to clipboard

[Bug] TransactionalMessageService may check unrelated producer in the same producer group

Open bxfjb opened this issue 1 month ago • 1 comments

Before Creating the Bug Report

  • [x] I found a bug, not just asking a question, which should be created in GitHub Discussions.

  • [x] I have searched the GitHub Issues and GitHub Discussions of this repository and believe that this is not a duplicate.

  • [x] I have confirmed that this bug belongs to the current repository, not other repositories of RocketMQ.

Runtime platform environment

any

RocketMQ version

develop

JDK Version

any

Describe the Bug

Theoretically producers that belongs to the same producer group should have the same behavior, but rocketmq didn't define this constraint in code, which means users could run several producers that pub messages to different topics in only one producer group. In transaction message check process, broker would select a active producer channel and send check request to get local transaction state, by calling getAvailableChannel. As shown, the selection may not differentiate producers, for broker think they are the same, even though there is no such mechanism to guarantee this. The above description may lead to the following results: users may misuse producer group, and the check state request may be sent to the wrong producer, and get wrong result, which happened in our production.

private final ConcurrentMap<String /* group name */, ConcurrentMap<Channel, ClientChannelInfo>> groupChannelTable =
        new ConcurrentHashMap<>();
...
public Channel getAvailableChannel(String groupId) {
        if (groupId == null) {
            return null;
        }
        List<Channel> channelList;
        ConcurrentMap<Channel, ClientChannelInfo> channelClientChannelInfoHashMap = groupChannelTable.get(groupId);
        if (channelClientChannelInfoHashMap != null) {
            channelList = new ArrayList<>(channelClientChannelInfoHashMap.keySet());
        } else {
            log.warn("Check transaction failed, channel table is empty. groupId={}", groupId);
            return null;
        }

        int size = channelList.size();
        if (0 == size) {
            log.warn("Channel list is empty. groupId={}", groupId);
            return null;
        }

        Channel lastActiveChannel = null;

        int index = positiveAtomicCounter.incrementAndGet() % size;
        Channel channel = channelList.get(index);
        int count = 0;
        boolean isOk = channel.isActive() && channel.isWritable();
        while (count++ < GET_AVAILABLE_CHANNEL_RETRY_COUNT) {
            if (isOk) {
                return channel;
            }
            if (channel.isActive()) {
                lastActiveChannel = channel;
            }
            index = (++index) % size;
            channel = channelList.get(index);
            isOk = channel.isActive() && channel.isWritable();
        }

        return lastActiveChannel;
    }

Steps to Reproduce

Run multiple producers, produce some trans msgs to different topics.

What Did You Expect to See?

Trans msgs should get committed/rollbacked correctly.

What Did You See Instead?

Trans msgs may be committed/rollbacked wrongly.

Additional Context

Solution is not clear for now, maybe we can build a lineage mechanism to connect producers with topics?

bxfjb avatar Oct 30 '25 08:10 bxfjb

Your analysis is excellent! My idea is that users can actually customize an appropriate state-checking logic when implementing the transaction status check method TransactionListener.checkLocalTransaction(MessageExt msg) (for example, using a distributed state cache, like managing shared transaction states within the group via Redis), ensuring that producers in the same producer group return the same check results. This aligns with RocketMQ's design principle of consistent behavior within the same producer group, while potentially causing a slight decrease in concurrent performance, thus preserving design flexibility.

contrueCT avatar Nov 05 '25 08:11 contrueCT