cassandra Lightweight transactions fail due to incorrect IP binding

This is very similar to #150 , but #151 didn't fix it for this case.

I'm using Cassandra with docker stack deploy in swarm mode. If I specify a ports: section in the stack YAML, _ip_address gives a different address than when I don't have ports:. With ports:, lightweight transactions (LWT) fail, without it, they're fine. I locally hacked the _ip_address function to only find addresses that start with 10.0., and that seemed to fix things up for me, but I don't really know if that's the way to go.

Oct 31 '18 14:10 sourada-e5

Wow, this is hairy. This is partially Cassandra's fault for being so overzealous about exact explicit IP addresses everywhere (to the point of making them part of the protocols directly), but also partly Docker's fault for providing such complicated networking that it becomes somewhere between very hard and impossible to determine "the container's IP address".

Just to hopefully help anyone who wants to dig into this more, I've reproduced this by doing the following simple steps:

$ docker network create --driver overlay --attachable test
$ docker service create --network test --name test --publish 1234:1234 cassandra

Then I did docker exec -it test.1.xxxxxxxxxxxxxxxxxxxxx bash to get the following values:

root@d1d0dec303e6:/# ip address
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
191580: eth0@if191581: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether 02:42:0a:ff:00:04 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.255.0.4/16 brd 10.255.255.255 scope global eth0
       valid_lft forever preferred_lft forever
191582: eth2@if191583: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:ac:14:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet 172.20.0.3/16 brd 172.20.255.255 scope global eth2
       valid_lft forever preferred_lft forever
191584: eth1@if191585: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether 02:42:0a:00:00:04 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet 10.0.0.4/24 brd 10.0.0.255 scope global eth1
       valid_lft forever preferred_lft forever

root@d1d0dec303e6:/# ip route
default via 172.20.0.1 dev eth2 
10.0.0.0/24 dev eth1 proto kernel scope link src 10.0.0.4 
10.255.0.0/16 dev eth0 proto kernel scope link src 10.255.0.4 
172.20.0.0/16 dev eth2 proto kernel scope link src 172.20.0.3

Additionally, here's the relevant section from docker container inspect:

{
  "Networks": {
    "ingress": {
      "IPAMConfig": {
        "IPv4Address": "10.255.0.4"
      },
      "Links": null,
      "Aliases": [
        "d1d0dec303e6"
      ],
      "NetworkID": "88zat916cnaig63d03u095rc5",
      "EndpointID": "d3290f8fd19a82c01cd4d29cffbaa87568a8ceed2d02821f3b8c2585704aefb4",
      "Gateway": "",
      "IPAddress": "10.255.0.4",
      "IPPrefixLen": 16,
      "IPv6Gateway": "",
      "GlobalIPv6Address": "",
      "GlobalIPv6PrefixLen": 0,
      "MacAddress": "02:42:0a:ff:00:04",
      "DriverOpts": null
    },
    "test": {
      "IPAMConfig": {
        "IPv4Address": "10.0.0.4"
      },
      "Links": null,
      "Aliases": [
        "d1d0dec303e6"
      ],
      "NetworkID": "lr0ptm8trd7ceh70277s2ggtm",
      "EndpointID": "aaca3019d270b6caf9f4b89023a0702be7f66d171f66c3b06efaf8d93fc2156f",
      "Gateway": "",
      "IPAddress": "10.0.0.4",
      "IPPrefixLen": 24,
      "IPv6Gateway": "",
      "GlobalIPv6Address": "",
      "GlobalIPv6PrefixLen": 0,
      "MacAddress": "02:42:0a:00:00:04",
      "DriverOpts": null
    }
  }
}

So in short, our container has no less than three candidate IP addresses, and there's really not any way I can see for us to differentiate them in an automated way:

172.20.0.3 is on docker_gwbridge (so is our default route / access to the internet)
10.255.0.4 is on ingress (so is the way Docker routes traffic on the exposed port we requested to us)
10.0.0.4 is our IP address on that test overlay network

All of those CIDRs are configurable/modifiable, and there's not really anything "telling" about each one on the interfaces themselves (the 172.20.0.3 has a slightly different mtu, but that's not really all that telling, and certainly isn't going to be safe to rely on).

I'm not sure what we should do here. :disappointed: :confused:

Nov 01 '18 21:11 tianon

I wish I had a good idea to contribute, but I haven't thought of one. We're going to work around it for now with some hackery in our Docker Compose file, since within our environment we can assume the Swarm network is using the default IP settings.

It would be nice if Docker had a magic IP you could query, like the 169.254.169.254 one that Amazon has in EC2, to get metadata...

Nov 12 '18 21:11 sourada-e5

I found this YAML to work:

version: '3'
services:
   cassandra-1:
      image: cassandra
      deploy:
         placement:
            constraints:
            - node.labels.application==cassandra1
      environment:
 #        CASSANDRA_BROADCAST_ADDRESS: "cassandra-1"
          CASSANDRA_LISTEN_ADDRESS: tasks.cassandra-1
      ports:
        - 7000
      volumes:
        - "/volume/cassandra:/var/lib/cassandra"
      networks:
        - cassandra 
   cassandra-2:
      image: cassandra
      deploy:
         placement:
            constraints:
            - node.labels.application==cassandra2
      environment:
#         CASSANDRA_BROADCAST_ADDRESS: "cassandra-2"
          CASSANDRA_LISTEN_ADDRESS: tasks.cassandra-2
          CASSANDRA_SEEDS: "tasks.cassandra-1"
      depends_on:
        - "cassandra-1"
      ports:
        - 7000
      volumes:
        - "/volume/cassandra:/var/lib/cassandra"
      networks:
       - cassandra
  
networks:
  cassandra:
    external:
     name: cassandra-net

tasks.cassandra-n in service name (instead of just cassandra-n) does the trick. Not sure exactly why.

docker version Client: Version: 18.06.1-ce API version: 1.38 Go version: go1.10.3 Git commit: e68fc7a Built: Tue Aug 21 17:24:56 2018 OS/Arch: linux/amd64 Experimental: false

Server: Engine: Version: 18.06.1-ce API version: 1.38 (minimum version 1.12) Go version: go1.10.3 Git commit: e68fc7a Built: Tue Aug 21 17:23:21 2018 OS/Arch: linux/amd64 Experimental: false


root@20237c4bbe35:/# nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns (effective)  Host ID                               Rack
UN  192.168.0.138  280.6 KiB  256          100.0%            551823e3-c3d2-4bcd-97e3-ec325e070ff8  rack1
UN  192.168.0.140  325.18 KiB  256          100.0%            c3bbfac8-a877-4cca-bc83-9120fbf067c3  rack1

root@20237c4bbe35:/#

Dec 11 '18 16:12 gsliskov

tasks.cassandra-n in service name (instead of just cassandra-n) does the trick. Not sure exactly why.

@gsliskov you are great. What does tasks.cassandra-n mean? How you find this guy?

Dec 29 '18 09:12 ft115637850

tasks.<service name> is the "api" for getting the real ip of all tasks within the selected service. https://docs.docker.com/network/overlay/#container-discovery

On an overlay network each service is also assigned a virtual IP by docker that is then load-balanced across all tasks (containers) of the service. If you don't want docker to create this extra network abstraction, just change the endpoint mode to dnsrr. So this should work the same as using tasks.x

version: '3.3'
services:
   cassandra-1:
      image: cassandra
      deploy:
         endpoint_mode: dnsrr
         placement:
            constraints:
            - node.labels.application==cassandra1
      environment:
         CASSANDRA_BROADCAST_ADDRESS: "cassandra-1"
      ports:
        - 7000
      volumes:
        - "/volume/cassandra:/var/lib/cassandra"
      networks:
        - cassandra 
   cassandra-2:
      image: cassandra
      deploy:
         endpoint_mode: dnsrr
         placement:
            constraints:
            - node.labels.application==cassandra2
      environment:
         CASSANDRA_BROADCAST_ADDRESS: cassandra-2
         CASSANDRA_LISTEN_ADDRESS: cassandra-2
         CASSANDRA_SEEDS: "cassandra-1"
      depends_on:
        - "cassandra-1"
      ports:
        - 7000
      volumes:
        - "/volume/cassandra:/var/lib/cassandra"
      networks:
       - cassandra
  
networks:
  cassandra:
    external:
     name: cassandra-net

Dec 31 '18 20:12 yosifkit

We also run docker swarm-mode. I get the container ip address with the hostname of the container. I replaced the _ip_address function in the docker-entrypoint.sh with this: _ip_address() { getent hosts $HOSTNAME | awk '{ print $1 }' } I do not know if it works for other docker environments.

Apr 30 '19 14:04 janpaulus

I've been having the same issue. I have 3 Cassandra nodes on my development machine.

This was helpful:

  environment:
      CASSANDRA_LISTEN_ADDRESS: tasks.cassandra-2
      CASSANDRA_SEEDS: "tasks.cassandra-1"

Adding the tasks. prefix makes it work for Docker Swarm, but now Docker Compose doesn't like it:

Unable to bind to address tasks.node1/92.242.132.16:7000. Set listen_address in cassandra.yaml to an interface you can bind to, e.g., your private IP address on EC2
Fatal configuration error; unable to start server.

Do I need to maintain two separate compose files, one for Docker Swarm and one for Docker Compose?

May 13 '20 14:05 Bananas-Are-Yellow

cassandra cassandra copied to clipboard

Lightweight transactions fail due to incorrect IP binding

cassandra
cassandra copied to clipboard