rabbitmq-java-client icon indicating copy to clipboard operation
rabbitmq-java-client copied to clipboard

newConnection does not shuffle addresses

Open a701440 opened this issue 3 years ago • 7 comments

We noticed that when using java client version 5.11.0 there is a significant skew in client connections towards the first Rabbit node in the list.

It appears that AutoRecovering connection actually shuffles the list of rabbit nodes when creating new connections, but regular connection does not.

public RecoveryAwareAMQConnection newConnection() throws IOException, TimeoutException {
    Exception lastException = null;
    List<Address> shuffled = shuffle(addressResolver.getAddresses());

There is no shuffle call in the regular ConnectionFactory.newConnection method.

a701440 avatar Mar 30 '21 16:03 a701440

You are welcome to submit a PR that introduces the shuffling for "regular" (initial or non-recovering) connections.

michaelklishin avatar Mar 30 '21 17:03 michaelklishin

I would not change the default behavior, which is predictable and seems convenient for most people for a very long time. I would not add yet another option to shuffle the addresses either.

I guess the problem is when opening Connections one after the other, the first address in the list is always used if the node is up during that period. You should try implementing your own AddressResolver, that would shuffle a fixed list of addresses when AddressResolver#getAddresses is called. Something like (I haven't tested the code):

import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public class ShuffledListAddressResolver implements AddressResolver {

  private final List<Address> addresses;

  public ShuffledListAddressResolver(List<Address> addresses) {
    this.addresses = new ArrayList<>(addresses);
  }

  @Override
  public List<Address> getAddresses() throws IOException {
    List<Address> shuffledAddresses = new ArrayList<>(this.addresses);
    Collections.shuffle(shuffledAddresses);
    return shuffledAddresses;
  }
}

acogoluegnes avatar Mar 31 '21 06:03 acogoluegnes

Yes. I have already fixed the issue for our system using custom AddressResolver. IMO it would be better to fix it for everybody. There is no reason to have a difference in behavior between the auto-recovering connection and the regular one. Most people are more likely to use the auto-recovering connection any way. The issue can be almost invisible with a small number of connections. In our case we have 6 nodes and ~5500 connections. This causes massive skew. The first node may end-up with ~3500 sockets and the rest 5 are with 200-300 sockets each. As far as memory the first one is at 2.8G and the rest are at 250M.

a701440 avatar Mar 31 '21 13:03 a701440

"fix it for everybody" is a bold claim, as nobody reported this "issue" in the last 8 years. I'm hearing the consistency argument, but I'm still reluctant to change a predictable default behavior, especially when there's a simple workaround based on an extension point. It may seem silly but maybe some systems rely on the first node in the list being picked up and we the RabbitMQ team will have to deal with the support effort that such a change can provoke. WDYT @michaelklishin?

We would not mind PRs to add the shuffling address resolver implementation and document it in the API guide though.

acogoluegnes avatar Apr 01 '21 08:04 acogoluegnes

I've taken a look at the .Net implementation for comparison. It appears that DefaultEndpointResolver which is the mirror class there does shuffle.

/// The default value creates an instance of the /// using the list of endpoints passed in. The DefaultEndpointResolver shuffles the /// provided list each time it is requested.

a701440 avatar Apr 01 '21 17:04 a701440

The shuffle method is not threadsafe, which will cause the addresses incorrect. It should be taken care of.

rogersc19 avatar Apr 14 '21 04:04 rogersc19

What are you referring to? The snippet above? This is no problem once the instance is initialized, each thread calling getAddresses uses its own copy of the list and shuffles this copy.

acogoluegnes avatar Apr 19 '21 06:04 acogoluegnes

The fact that a custom endpoint resolver can be implemented and provided by the user seems sufficient to close it. Most users seemingly have no issues with the resolver provided out of the box.

michaelklishin avatar Apr 10 '23 06:04 michaelklishin