xandra icon indicating copy to clipboard operation
xandra copied to clipboard

autodiscovery: true opens connections to new nodes in other data centers

Open hubertpompecki opened this issue 2 years ago • 4 comments

The documentation states that:

When nodes in the cluster are discovered, a Xandra pool of connections is started for each node that is in the same datacenter as one of the nodes in :nodes.

The above is true at startup but doesn't appear to be true later at runtime. When autodiscovery is set to true, Xandra opens connections to all nodes as soon as they become available even if the new node and the control connection are in different data centers.

Replication steps are given below.

Given the description above, I'd guess this is a bug. I think when Xandra receives a NEW_NODE event it should consult system tables to check the location of the new node before starting a connection pool.

I'm happy to look into fixing this if this idea sounds reasonable.

Replicating locally

  1. Start east Cassandra data center, 2 nodes:
$ docker network create xandra-test-network
$ docker run -d --name cassandra-east-1 --network xandra-test-network -e CASSANDRA_DC=east -e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch -e MAX_HEAP_SIZE=1G -e HEAP_NEWSIZE=256M cassandra:4.0.1
$ docker run -d --name cassandra-east-2 --network xandra-test-network -e CASSANDRA_DC=east -e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch -e CASSANDRA_SEEDS=cassandra-east-1 -e MAX_HEAP_SIZE=1G -e HEAP_NEWSIZE=256M cassandra:4.0.1
  1. Start west Cassandra data center, 2 nodes:
$ docker run -d --name cassandra-west-1 --network xandra-test-network -e CASSANDRA_DC=west -e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch -e CASSANDRA_SEEDS=cassandra-east-1 -e MAX_HEAP_SIZE=1G -e HEAP_NEWSIZE=256M cassandra:4.0.1
$ docker run -d --name cassandra-west-2 --network xandra-test-network -e CASSANDRA_DC=west -e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch -e CASSANDRA_SEEDS=cassandra-east-1 -e MAX_HEAP_SIZE=1G -e HEAP_NEWSIZE=256M cassandra:4.0.1
  1. Check cluster state
$ docker exec -it cassandra-east-1 bash
root@099d258b5d2d:/# nodetool status
Datacenter: east
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens  Owns (effective)  Host ID                               Rack
UN  172.25.0.2  74.1 KiB   16      51.4%             81551143-3bf3-46ce-97db-c084094c7b20  rack1
UN  172.25.0.3  69.07 KiB  16      50.8%             9b430614-fdb8-45dc-8295-bf5bd4d3d2ec  rack1

Datacenter: west
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens  Owns (effective)  Host ID                               Rack
UN  172.25.0.5  69.06 KiB  16      49.1%             705c218d-09c8-4701-ac18-6a965955f132  rack1
UN  172.25.0.4  74.04 KiB  16      48.7%             48df89ca-cfee-4c95-833d-67ee1ca80510  rack1
  1. Start an Elixir container in the same Docker network
$ docker run -it --rm --name elixir --network xandra-test-network elixir:1.13.2
  1. Connect to the Cassandra cluster using a node from east data center
iex(1)> Mix.install([{:xandra, "~> 0.11"}])
iex(2)> {:ok, cluster} = Xandra.Cluster.start_link(nodes: ["cassandra-east-1"], pool_size: 5)
13:21:17.232 [debug] Control connection for 172.25.0.2:9042 is up

13:21:17.235 [debug] Started connection to {172, 25, 0, 2}

13:21:17.235 [debug] Discovered peers: [{172, 25, 0, 3}]

13:21:17.236 [debug] Started connection to {172, 25, 0, 3}
iex(3)> :sys.get_state(cluster)
%Xandra.Cluster{
  autodiscovered_nodes_port: 9042,
  autodiscovery: true,
  load_balancing: :random,
  node_refs: [{#Reference<0.4050667325.154927106.225421>, {172, 25, 0, 2}}],
  options: [
    protocol_module: Xandra.Protocol.V3,
    idle_interval: 30000,
    protocol_version: :v3,
    pool_size: 5
  ],
  pool_supervisor: #PID<0.741.0>,
  pools: %{{172, 25, 0, 2} => #PID<0.744.0>, {172, 25, 0, 3} => #PID<0.751.0>}
}

Notice that the client started connections to both nodes in this data center. This is as expected.

  1. Start another node in west data center
docker run -d --name cassandra-west-3 --network xandra-test-network -e CASSANDRA_DC=west -e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch -e CASSANDRA_SEEDS=cassandra-east-1 -e MAX_HEAP_SIZE=1G -e HEAP_NEWSIZE=256M cassandra:4.0.1

The new cluster status is

root@099d258b5d2d:/# nodetool status
Datacenter: east
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens  Owns (effective)  Host ID                               Rack
UN  172.25.0.2  74.1 KiB   16      43.7%             81551143-3bf3-46ce-97db-c084094c7b20  rack1
UN  172.25.0.3  69.07 KiB  16      45.7%             9b430614-fdb8-45dc-8295-bf5bd4d3d2ec  rack1

Datacenter: west
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens  Owns (effective)  Host ID                               Rack
UN  172.25.0.5  69.06 KiB  16      33.3%             705c218d-09c8-4701-ac18-6a965955f132  rack1
UN  172.25.0.7  93.45 KiB  16      42.1%             9663d344-485a-4023-907c-fb5a9d6717f9  rack1
UN  172.25.0.4  74.04 KiB  16      35.3%             48df89ca-cfee-4c95-833d-67ee1ca80510  rack1

And Xandra logs the following:

13:23:43.883 [debug] Received event: %Xandra.Cluster.TopologyChange{address: {172, 25, 0, 7}, effect: "NEW_NODE", port: 9042}

13:23:43.883 [debug] Received event: %Xandra.Cluster.StatusChange{address: {172, 25, 0, 7}, effect: "UP", port: 9042}

13:23:43.885 [debug] Started connection to {172, 25, 0, 7}

There is a new connection pool for the new node:

iex(4)> :sys.get_state(cluster)
%Xandra.Cluster{
  autodiscovered_nodes_port: 9042,
  autodiscovery: true,
  load_balancing: :random,
  node_refs: [{#Reference<0.4050667325.154927106.225421>, {172, 25, 0, 2}}],
  options: [
    protocol_module: Xandra.Protocol.V3,
    idle_interval: 30000,
    protocol_version: :v3,
    pool_size: 5
  ],
  pool_supervisor: #PID<0.741.0>,
  pools: %{
    {172, 25, 0, 2} => #PID<0.744.0>,
    {172, 25, 0, 3} => #PID<0.751.0>,
    {172, 25, 0, 7} => #PID<0.759.0>  # <--- New connection pool
  }
}

hubertpompecki avatar Feb 07 '22 13:02 hubertpompecki

@hubertpompecki this is very interesting, thanks for the callout!

I'm trying to understand why this would be considered a bug. Is it standard C* client behavior to only autodiscover nodes in the same datacenter?

I'm wondering if instead we should control this behavior with an option, for example autodiscovery: true, autodiscover_only_in_same_datacenter: true or something along those lines.

Also, hey, thank you so much for offering help! I would definitely appreciate it once we reach a good way to approach this 💟 🙃

whatyouhide avatar Feb 09 '22 07:02 whatyouhide

Thanks, @whatyouhide!

Perhaps a bug is too strong a word but I hope you'll agree that the behaviour here is somewhat inconsistent:

  • existing nodes in other data centres aren't auto-discovered at startup,
  • new nodes in other data centres are auto-discovered later at runtime as they become available.

The latter creates a problem because the only load balancing option that is available with autodiscovery: true is :random. This means that the client will start sending some queries to these auto-discovered nodes in distant data centres. This, in turn:

  • generates additional latency,
  • may make low-consistency queries less reliable (data replication to distant data centres may be slower although I'm far from being a Cassandra expert).

This is demonstrated below. Given the following cluster:

root@17c8750820f4:/# nodetool status
Datacenter: east
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens  Owns (effective)  Host ID                               Rack
UN  172.25.0.2  74.11 KiB  16      100.0%            92c20a05-8fd0-4334-a86d-c0f5c5f759e8  rack1

Datacenter: west
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens  Owns (effective)  Host ID                               Rack
UN  172.25.0.4  93.44 KiB  16      100.0%            d6c8b9cd-1aee-47d8-b908-746f017f9f7c  rack1

172.25.0.2 was the seed node. 172.25.0.4 was later brought up at runtime and autodiscovered by Xandra.

iex(2)> {:ok, cluster} = Xandra.Cluster.start_link(nodes: ["cassandra-east-1"], pool_size: 5)
{:ok, #PID<0.740.0>}
iex(3)>
10:53:24.163 [debug] Control connection for 172.25.0.2:9042 is up

10:53:24.166 [debug] Started connection to {172, 25, 0, 2}

10:53:24.166 [debug] Discovered peers: []

10:54:13.881 [debug] Received event: %Xandra.Cluster.TopologyChange{address: {172, 25, 0, 4}, effect: "NEW_NODE", port: 9042}

10:54:13.881 [debug] Received event: %Xandra.Cluster.StatusChange{address: {172, 25, 0, 4}, effect: "UP", port: 9042}

10:54:13.884 [debug] Started connection to {172, 25, 0, 4}
iex(3)> :sys.get_state(cluster)
%Xandra.Cluster{
  autodiscovered_nodes_port: 9042,
  autodiscovery: true,
  load_balancing: :random,
  node_refs: [{#Reference<0.2460134735.578027522.169328>, {172, 25, 0, 2}}],
  options: [
    protocol_module: Xandra.Protocol.V3,
    idle_interval: 30000,
    protocol_version: :v3,
    pool_size: 5
  ],
  pool_supervisor: #PID<0.741.0>,
  pools: %{{172, 25, 0, 2} => #PID<0.744.0>, {172, 25, 0, 4} => #PID<0.751.0>}
}

Queries are now sent to the node in the west data centre as well even though the client was seeded using the node from the east data centre:

iex(4)> Xandra.Cluster.run(cluster, [], fn conn -> Xandra.execute!(conn, "SELECT * FROM system.local;") end)
#Xandra.Page<[
  rows: [
    %{
      "broadcast_address" => {172, 25, 0, 4},
      "data_center" => "west",
      ...
    }
  ]
]>
iex(4)> Xandra.Cluster.run(cluster, [], fn conn -> Xandra.execute!(conn, "SELECT * FROM system.local;") end)
#Xandra.Page<[
  rows: [
    %{
      "broadcast_address" => {172, 25, 0, 2},
      "data_center" => "east",
      ...
    }
  ]
]>

Opening connections to distant nodes wouldn't be a problem if we had a DC-aware load balancing strategy that would only use nodes in other datacenters as a last resort. This appears to be way other clients work, e.g. Python.

The fact that nodes in other datacenters aren't auto-discovered at startup and that the only available load balancing option is :random suggests that we shouldn't open connections to nodes in distant data centres at all. However, perhaps the true problem here is the fact that we don't have a datacentre-aware load balancing policy that we could use with autodiscovery? All these connections to nodes in other data centres wouldn't matter then and could even serve as a fallback if local nodes can't be reached for some reason.

What do you think @whatyouhide?

hubertpompecki avatar Feb 09 '22 11:02 hubertpompecki

Yes, this all makes sense. I would suggest to start by controlling this with an option, such as autodiscover_other_datacenters: true | false, and default that to false. This should solve your issue and provide a fallback for users that were relying on the current behaviour (provided there are any... 😉 ). Would you be willing to give it a shot and work on a PR?

whatyouhide avatar Feb 15 '22 13:02 whatyouhide

Thanks @whatyouhide

I've been working on a change to introduce :dc_aware load balancing option in the meantime. It largely overlaps with what autodiscover_other_datacenters would require. I've opened a draft PR with both.

hubertpompecki avatar Feb 18 '22 11:02 hubertpompecki

Hi @hubertpompecki , is the PR ready to use for production? of course i will test it in staging first.

We have to use DC aware policy due to resizing our cluster. I saw your PR is the only solution, since other libs still not support it.

awbuana avatar Oct 19 '22 12:10 awbuana