cassandra
cassandra copied to clipboard
Lightweight transactions fail due to incorrect IP binding
This is very similar to #150 , but #151 didn't fix it for this case.
I'm using Cassandra with docker stack deploy
in swarm mode. If I specify a ports:
section in the stack YAML, _ip_address
gives a different address than when I don't have ports:
. With ports:
, lightweight transactions (LWT) fail, without it, they're fine. I locally hacked the _ip_address
function to only find addresses that start with 10.0.
, and that seemed to fix things up for me, but I don't really know if that's the way to go.
Wow, this is hairy. This is partially Cassandra's fault for being so overzealous about exact explicit IP addresses everywhere (to the point of making them part of the protocols directly), but also partly Docker's fault for providing such complicated networking that it becomes somewhere between very hard and impossible to determine "the container's IP address".
Just to hopefully help anyone who wants to dig into this more, I've reproduced this by doing the following simple steps:
$ docker network create --driver overlay --attachable test
$ docker service create --network test --name test --publish 1234:1234 cassandra
Then I did docker exec -it test.1.xxxxxxxxxxxxxxxxxxxxx bash
to get the following values:
root@d1d0dec303e6:/# ip address
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
191580: eth0@if191581: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
link/ether 02:42:0a:ff:00:04 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.255.0.4/16 brd 10.255.255.255 scope global eth0
valid_lft forever preferred_lft forever
191582: eth2@if191583: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:14:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 2
inet 172.20.0.3/16 brd 172.20.255.255 scope global eth2
valid_lft forever preferred_lft forever
191584: eth1@if191585: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
link/ether 02:42:0a:00:00:04 brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet 10.0.0.4/24 brd 10.0.0.255 scope global eth1
valid_lft forever preferred_lft forever
root@d1d0dec303e6:/# ip route
default via 172.20.0.1 dev eth2
10.0.0.0/24 dev eth1 proto kernel scope link src 10.0.0.4
10.255.0.0/16 dev eth0 proto kernel scope link src 10.255.0.4
172.20.0.0/16 dev eth2 proto kernel scope link src 172.20.0.3
Additionally, here's the relevant section from docker container inspect
:
{
"Networks": {
"ingress": {
"IPAMConfig": {
"IPv4Address": "10.255.0.4"
},
"Links": null,
"Aliases": [
"d1d0dec303e6"
],
"NetworkID": "88zat916cnaig63d03u095rc5",
"EndpointID": "d3290f8fd19a82c01cd4d29cffbaa87568a8ceed2d02821f3b8c2585704aefb4",
"Gateway": "",
"IPAddress": "10.255.0.4",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "02:42:0a:ff:00:04",
"DriverOpts": null
},
"test": {
"IPAMConfig": {
"IPv4Address": "10.0.0.4"
},
"Links": null,
"Aliases": [
"d1d0dec303e6"
],
"NetworkID": "lr0ptm8trd7ceh70277s2ggtm",
"EndpointID": "aaca3019d270b6caf9f4b89023a0702be7f66d171f66c3b06efaf8d93fc2156f",
"Gateway": "",
"IPAddress": "10.0.0.4",
"IPPrefixLen": 24,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "02:42:0a:00:00:04",
"DriverOpts": null
}
}
}
So in short, our container has no less than three candidate IP addresses, and there's really not any way I can see for us to differentiate them in an automated way:
-
172.20.0.3
is ondocker_gwbridge
(so is our default route / access to the internet) -
10.255.0.4
is oningress
(so is the way Docker routes traffic on the exposed port we requested to us) -
10.0.0.4
is our IP address on thattest
overlay network
All of those CIDRs are configurable/modifiable, and there's not really anything "telling" about each one on the interfaces themselves (the 172.20.0.3
has a slightly different mtu
, but that's not really all that telling, and certainly isn't going to be safe to rely on).
I'm not sure what we should do here. :disappointed: :confused:
I wish I had a good idea to contribute, but I haven't thought of one. We're going to work around it for now with some hackery in our Docker Compose file, since within our environment we can assume the Swarm network is using the default IP settings.
It would be nice if Docker had a magic IP you could query, like the 169.254.169.254 one that Amazon has in EC2, to get metadata...
I found this YAML to work:
version: '3'
services:
cassandra-1:
image: cassandra
deploy:
placement:
constraints:
- node.labels.application==cassandra1
environment:
# CASSANDRA_BROADCAST_ADDRESS: "cassandra-1"
CASSANDRA_LISTEN_ADDRESS: tasks.cassandra-1
ports:
- 7000
volumes:
- "/volume/cassandra:/var/lib/cassandra"
networks:
- cassandra
cassandra-2:
image: cassandra
deploy:
placement:
constraints:
- node.labels.application==cassandra2
environment:
# CASSANDRA_BROADCAST_ADDRESS: "cassandra-2"
CASSANDRA_LISTEN_ADDRESS: tasks.cassandra-2
CASSANDRA_SEEDS: "tasks.cassandra-1"
depends_on:
- "cassandra-1"
ports:
- 7000
volumes:
- "/volume/cassandra:/var/lib/cassandra"
networks:
- cassandra
networks:
cassandra:
external:
name: cassandra-net
tasks.cassandra-n in service name (instead of just cassandra-n) does the trick. Not sure exactly why.
docker version Client: Version: 18.06.1-ce API version: 1.38 Go version: go1.10.3 Git commit: e68fc7a Built: Tue Aug 21 17:24:56 2018 OS/Arch: linux/amd64 Experimental: false
Server: Engine: Version: 18.06.1-ce API version: 1.38 (minimum version 1.12) Go version: go1.10.3 Git commit: e68fc7a Built: Tue Aug 21 17:23:21 2018 OS/Arch: linux/amd64 Experimental: false
root@20237c4bbe35:/# nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.0.138 280.6 KiB 256 100.0% 551823e3-c3d2-4bcd-97e3-ec325e070ff8 rack1
UN 192.168.0.140 325.18 KiB 256 100.0% c3bbfac8-a877-4cca-bc83-9120fbf067c3 rack1
root@20237c4bbe35:/#
tasks.cassandra-n in service name (instead of just cassandra-n) does the trick. Not sure exactly why.
@gsliskov you are great. What does tasks.cassandra-n mean? How you find this guy?
tasks.<service name>
is the "api" for getting the real ip of all tasks within the selected service. https://docs.docker.com/network/overlay/#container-discovery
On an overlay network each service is also assigned a virtual IP by docker that is then load-balanced across all tasks (containers) of the service. If you don't want docker to create this extra network abstraction, just change the endpoint mode to dnsrr
. So this should work the same as using tasks.x
version: '3.3'
services:
cassandra-1:
image: cassandra
deploy:
endpoint_mode: dnsrr
placement:
constraints:
- node.labels.application==cassandra1
environment:
CASSANDRA_BROADCAST_ADDRESS: "cassandra-1"
ports:
- 7000
volumes:
- "/volume/cassandra:/var/lib/cassandra"
networks:
- cassandra
cassandra-2:
image: cassandra
deploy:
endpoint_mode: dnsrr
placement:
constraints:
- node.labels.application==cassandra2
environment:
CASSANDRA_BROADCAST_ADDRESS: cassandra-2
CASSANDRA_LISTEN_ADDRESS: cassandra-2
CASSANDRA_SEEDS: "cassandra-1"
depends_on:
- "cassandra-1"
ports:
- 7000
volumes:
- "/volume/cassandra:/var/lib/cassandra"
networks:
- cassandra
networks:
cassandra:
external:
name: cassandra-net
We also run docker swarm-mode. I get the container ip address with the hostname of the container. I replaced the _ip_address function in the docker-entrypoint.sh with this:
_ip_address() { getent hosts $HOSTNAME | awk '{ print $1 }' }
I do not know if it works for other docker environments.
I've been having the same issue. I have 3 Cassandra nodes on my development machine.
This was helpful:
environment: CASSANDRA_LISTEN_ADDRESS: tasks.cassandra-2 CASSANDRA_SEEDS: "tasks.cassandra-1"
Adding the tasks.
prefix makes it work for Docker Swarm, but now Docker Compose doesn't like it:
Unable to bind to address tasks.node1/92.242.132.16:7000. Set listen_address in cassandra.yaml to an interface you can bind to, e.g., your private IP address on EC2
Fatal configuration error; unable to start server.
Do I need to maintain two separate compose files, one for Docker Swarm and one for Docker Compose?