cassandra-medusa icon indicating copy to clipboard operation
cassandra-medusa copied to clipboard

backup incomplete fqdn and public/private ip

Open Anasme opened this issue 3 years ago • 7 comments

Project board link

Hi,

I am having trouble getting a positive result on my backup report with my 3 cassandra nodes.

I use a configuration with two network cards, one of them with public addresses.

Basically, my listen_address is on the hostname of my nodes which are on a private encrypted network, and for the rpc I am open in 0.0.0.0 but the broadcast_rpc_address is on the PUBLIC IP of these nodes, because I have drivers that connect from outside.

I have tried several configurations with medusa including this one:

fqdn: public-ip resolve_ip_addresses = False

When I backup the 3 nodes I got 3 nodes finished but incomplete, example:

- Started: 2022-02-03 00:00:10, Finished: never
- 3 nodes completed, 0 nodes incomplete, 2 nodes missing
- Missing nodes:
    private-ip
- 12696 files, 3.20 GB

I noticed that the private ip that is displayed is the one of the first node that launched the backup-node.

The tokenmap file seems to mix private ip and public ip, we don't know exactly why.

My backup are done in S3 and I can see folder with public ip for each nodes.

Why is it that for two nodes it uses the public ip and for the first node that launches the backup, it performs the backup but resolves its private ip?

Is it possible to force the fqdn without relation with the cassandra rpc broadcast or the opposite to be dependent only on the rpc broadcast?

Thank you for your help on the subject !!!!

┆Issue is synchronized with this Jira Story by Unito

Anasme avatar Feb 03 '22 07:02 Anasme

I'm having the same issue. Appending the /etc/hosts file of the nodes to include <private ip> <public ip> of other nodes makes it work, but I'm not really satisified with this method. A way to specify the fqdn in a file would be great

lilianabiven avatar Feb 03 '22 16:02 lilianabiven

A way to specify the fqdn in a file would be great

That can be done by setting the fqdn in the medusa.ini file. You can also set resolve_ip_addresses to false in the cassandra section of that same file so that the IPs are used instead of the hostnames: https://github.com/thelastpickle/cassandra-medusa/blob/eff15745fe88823c10dc2f7e02a31b81ee97709a/medusa/config.py#L41

adejanovski avatar Feb 03 '22 16:02 adejanovski

yes we set the ip in fqdn section of medusa.ini file but the Cassandra-driver take the ip of the listen_address field of the cassandra.yml file and feed the tokenmap file with it at the first backup launched (mixing private ip and public from rpc)

## cassandra_utils.py
class Cassandra(object):
---
    def __init__(self, config, contact_point=None, release_version=None):
---
        self._hostname = contact_point if contact_point is not None else config_reader.listen_address # I think this is the moment where it took the private ip from listen_address

If we have 3 nodes with 3 public ip in fqdn, and the listen address is on the private ip, we will get a missing nodes on this private ip at the status backup result.

We did the same @lilianabiven, feed the hosts with public ip of other two nodes, and feed private ip on the current node with resolve_ip_addresse to true to get a result complete on the status... but as u said this solution is not optimal.

## node server hosts file
# self
ip-private hostname-self-node
# other 2 nodes
ip-public hostname-other-node
ip-public hostname-other-node
## medusa.ini config
fqdn = hostname
resolve_ip_addresses = True

I don't know if the problem is clear enough with this explication.

Maybe implementing something like this in medusa, allowing to configure address_translator for the driver should be a solution? :

address_translator = <cassandra.policies.IdentityTranslator object> policies.AddressTranslator instance to be used in translating server node addresses to driver connection addresses. https://docs.datastax.com/en/developer/python-driver/3.24/api/cassandra/cluster/

Thanks anyway for your time, tell me if you need more informations.

Anasme avatar Feb 04 '22 07:02 Anasme

another way to make it work without modifying any config other than medusa's is to set on all nodes: resolve_ip_addresses = False and on the node where the medusa backup-cluster command is executed: fqdn = <private ip> on all other nodes: fqdn = <public ip>

lilianabiven avatar Feb 04 '22 10:02 lilianabiven

in the end the solution we will be going with is :

  • setting fqdn = <private ip>
  • replacing the private IPs with the public IPs in the tokenmap.json files of the backups we want to restore

lilianabiven avatar Feb 21 '22 08:02 lilianabiven

Please add your planning poker estimate with ZenHub @adejanovski

jsanda avatar Apr 19 '22 18:04 jsanda

Just placed my estimate @jsanda, based on the assumption that we'd use the address translator from the driver.

adejanovski avatar May 10 '22 08:05 adejanovski

I am having a similar issue with a setup based on Docker Compose. We have 3 physical machines running 3 nodes each using Docker Compose and two network interfaces. Each node has a private ip for intra-node communication (listen_address) and a external ip for RPC and thrift clients (rpc_address) which are configured on the host machines using Docker host network_mode. I am trying to run a simple backup using Medusa but it seems the internal IP addresses (listen_address on port 9042) are resolved to execute CQL commands, which obviously do not work. Any idea how to fix this without the address translator?

vincent-smit avatar May 31 '22 14:05 vincent-smit