rabbitmq-java-client
rabbitmq-java-client copied to clipboard
newConnection does not shuffle addresses
We noticed that when using java client version 5.11.0 there is a significant skew in client connections towards the first Rabbit node in the list.
It appears that AutoRecovering connection actually shuffles the list of rabbit nodes when creating new connections, but regular connection does not.
public RecoveryAwareAMQConnection newConnection() throws IOException, TimeoutException {
Exception lastException = null;
List<Address> shuffled = shuffle(addressResolver.getAddresses());
There is no shuffle call in the regular ConnectionFactory.newConnection method.
You are welcome to submit a PR that introduces the shuffling for "regular" (initial or non-recovering) connections.
I would not change the default behavior, which is predictable and seems convenient for most people for a very long time. I would not add yet another option to shuffle the addresses either.
I guess the problem is when opening Connection
s one after the other, the first address in the list is always used if the node is up during that period. You should try implementing your own AddressResolver
, that would shuffle a fixed list of addresses when AddressResolver#getAddresses
is called. Something like (I haven't tested the code):
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public class ShuffledListAddressResolver implements AddressResolver {
private final List<Address> addresses;
public ShuffledListAddressResolver(List<Address> addresses) {
this.addresses = new ArrayList<>(addresses);
}
@Override
public List<Address> getAddresses() throws IOException {
List<Address> shuffledAddresses = new ArrayList<>(this.addresses);
Collections.shuffle(shuffledAddresses);
return shuffledAddresses;
}
}
Yes. I have already fixed the issue for our system using custom AddressResolver. IMO it would be better to fix it for everybody. There is no reason to have a difference in behavior between the auto-recovering connection and the regular one. Most people are more likely to use the auto-recovering connection any way. The issue can be almost invisible with a small number of connections. In our case we have 6 nodes and ~5500 connections. This causes massive skew. The first node may end-up with ~3500 sockets and the rest 5 are with 200-300 sockets each. As far as memory the first one is at 2.8G and the rest are at 250M.
"fix it for everybody" is a bold claim, as nobody reported this "issue" in the last 8 years. I'm hearing the consistency argument, but I'm still reluctant to change a predictable default behavior, especially when there's a simple workaround based on an extension point. It may seem silly but maybe some systems rely on the first node in the list being picked up and we the RabbitMQ team will have to deal with the support effort that such a change can provoke. WDYT @michaelklishin?
We would not mind PRs to add the shuffling address resolver implementation and document it in the API guide though.
I've taken a look at the .Net implementation for comparison. It appears that DefaultEndpointResolver which is the mirror class there does shuffle.
/// The default value creates an instance of the
The shuffle
method is not threadsafe, which will cause the addresses incorrect. It should be taken care of.
What are you referring to? The snippet above? This is no problem once the instance is initialized, each thread calling getAddresses
uses its own copy of the list and shuffles this copy.
The fact that a custom endpoint resolver can be implemented and provided by the user seems sufficient to close it. Most users seemingly have no issues with the resolver provided out of the box.