cassandra-medusa icon indicating copy to clipboard operation
cassandra-medusa copied to clipboard

Backup-cluster fails on all authentication - or on second attempt

Open tlb1galaxy opened this issue 2 years ago • 2 comments

Project board link

Hello, I am trying to backup a new Cassandra cluster (4 x node of CentOS7) using local storage (NFS mounts shared by all nodes) and all forms of authentication seems to fail.

Have SSH-auth configured between all the nodes. Have enabled and populated ssh-agent (even-though I cannot find any documentation referencing this as a requirement)

  • sshd_config (all nodes)
    • AllowAgentForward yes
  • ssh_config (all nodes)
    • AgentForward yes

ENVIRONMENT: Cassandra version:

[root@cassandranode03 ~]# nodetool version
ReleaseVersion: 3.11.12

Cassandra status:

[root@cassandranode03 ~]# nodetool status
Datacenter: tlb1
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens       Owns (effective)  Host ID             Rack
UN  172.16.253.31  245.13 KiB  256          49.0%             1922256d-REMOVED  compass_cassandra01_rack01
UN  172.16.253.33  425.6 KiB  256          46.2%             fc247005-REMOVED  compass_cassandra01_rack01
UN  172.16.253.32  231.49 KiB  256          48.7%             2b094909-REMOVED  compass_cassandra01_rack01
UN  172.16.253.34  275.85 KiB  256          56.0%             9b858d8c-REMOVED  compass_cassandra01_rack01

Python:

[root@cassandranode03 ~]# python --version
Python 2.7.5
[root@cassandranode03 ~]# python3 --version
Python 3.6.8
[root@cassandranode03 ~]# which python
/usr/bin/python
[root@cassandranode03 ~]# which python3
/usr/bin/python3

Medusa:

[root@cassandranode03 ~]# medusa --version
0.12.2

PIP packages:

[root@cassandranode03 ~]# pip3 list installed
Package                Version
---------------------- -----------
apache-libcloud        3.3.1
cassandra-driver       3.25.0
cassandra-medusa       0.12.2
cassandra-pylib        0.0.0
certifi                2021.10.8
cffi                   1.15.0
chardet                3.0.4
click                  8.0.4
click-aliases          1.0.1
cryptography           3.3.2
fasteners              0.16
ffwd                   0.0.2
geomet                 0.2.1.post1
gevent                 21.12.0
greenlet               1.1.2
grpcio                 1.44.0
grpcio-health-checking 1.44.0
grpcio-tools           1.44.0
idna                   2.8
importlib-metadata     4.8.3
lockfile               0.12.2
parallel-ssh           2.2.0
pip                    21.3.1
protobuf               3.19.4
psutil                 5.9.0
pycparser              2.21
pycryptodome           3.14.1
python-dateutil        2.8.0
PyYAML                 6.0
requests               2.22.0
retrying               1.3.3
setuptools             39.2.0
six                    1.16.0
ssh-python             0.10.0
ssh2-python            0.22.0
typing_extensions      4.1.1
urllib3                1.25.11
zipp                   3.6.0
zope.event             4.5.0
zope.interface         5.4.0

OS:

[root@cassandranode03 ~]# cat /etc/centos-release
CentOS Linux release 7.9.2009 (Core)

Mounts:

[root@cassandranode03 ~]# df -hT
Filesystem                                             Type      Size  Used Avail Use% Mounted on
devtmpfs                                               devtmpfs  7.9G     0  7.9G   0% /dev
tmpfs                                                  tmpfs     7.9G     0  7.9G   0% /dev/shm
tmpfs                                                  tmpfs     7.9G  8.9M  7.9G   1% /run
tmpfs                                                  tmpfs     7.9G     0  7.9G   0% /sys/fs/cgroup
/dev/mapper/vg01-root                                  xfs       8.4G  1.9G  6.5G  23% /
/dev/sda1                                              xfs       473M  160M  313M  34% /boot
/dev/mapper/vg04_lvm_casslogs01-lvm_casslogs01         xfs        20G   36M   20G   1% /storage/lvm_casslogs01
/dev/mapper/vg01-var                                   xfs       9.4G  553M  8.8G   6% /var
/dev/mapper/vg03_lvm_cassdata01-lvm_cassdata01         xfs        10G   37M   10G   1% /storage/lvm_cassdata01
172.16.253.30:/storage/lvm_backup01/backups/compasscass nfs4      280G   33M  280G   1% /exports/compassfile01/lvm_backup01/backups/compasscass
tmpfs 

SSH auth:

[root@cassandranode03 ~]# ssh [email protected]
Last login: Wed Apr 27 13:17:00 2022 from cassandranode03.tlb1.lab.net
[root@cassandranode01 ~]# exit
logout
Connection to cassandranode01.tlb1.lab.net closed.
[root@cassandranode03 ~]# ssh [email protected]
Last login: Wed Apr 27 14:06:33 2022 from cassandranode03.tlb1.lab.net
[root@cassandranode02 ~]# exit
logout
Connection to cassandranode02.tlb1.lab.net closed.
[root@cassandranode03 ~]# ssh [email protected]
Last login: Wed Apr 27 13:57:35 2022 from cassandranode03.tlb1.lab.net
[root@cassandranode04 ~]# exit
logout
Connection to cassandranode04.tlb1.lab.net closed.

SSH-agent:

[root@cassandranode03 ~]# ps -aux | grep ssh-agent
root      1780  0.0  0.0  72552  1228 ?        Ss   14:15   0:00 ssh-agent
root      2243  0.0  0.0 112808   976 pts/0    S+   15:11   0:00 grep --color=auto ssh-agent
[root@cassandranode03 ~]# ssh-add -l
4096 SHA256:{{REMOVED}} /root/.ssh/id_rsa (RSA)

ERRORS:

Medusa command:

### Medusa command output on source
[root@cassandranode03 ~]# medusa -vv backup-cluster --backup-name=manual1321 --mode=full
[2022-04-27 15:13:12,164] DEBUG: Loading configuration from /etc/medusa/medusa.ini
[2022-04-27 15:13:12,168] DEBUG: Resolved 172.16.253.33 to cassandranode03.tlb1.lab.net
[2022-04-27 15:13:12,168] DEBUG: Logging to file options: LoggingConfig(enabled='1', file='medusa.log', format='[%(asctime)s] %(levelname)s: %(message)s', level='INFO', maxBytes='20000000', backupCount='30')
[2022-04-27 15:13:12,170] INFO: Monitoring provider is noop
[2022-04-27 15:13:12,170] DEBUG: Loading storage_provider: local
[2022-04-27 15:13:12,173] INFO: No backups found in index. Consider running "medusa build-index" if you have some backups
[2022-04-27 15:13:12,173] INFO: Starting backup manual1321
[2022-04-27 15:13:12,180] DEBUG: This server has systemd: True
[2022-04-27 15:13:12,490] DEBUG: Connecting to cluster, contact points: ['cassandranode03.tlb1.lab.net']; protocol version: 66
[2022-04-27 15:13:12,491] DEBUG: Host 172.16.253.33:9042 is now marked up
[2022-04-27 15:13:12,491] DEBUG: [control connection] Opening new connection to 172.16.253.33:9042
[2022-04-27 15:13:12,493] DEBUG: Sending initial options message for new connection (140075216313648) to 172.16.253.33:9042
[2022-04-27 15:13:12,494] DEBUG: Defuncting connection (140075216313648) to 172.16.253.33:9042: <Error from server: code=000a [Protocol error] message="Invalid or unsupported protocol version (66); supported versions are (3/v3, 4/v4, 5/v5-beta)">
[2022-04-27 15:13:12,494] DEBUG: Closing connection (140075216313648) to 172.16.253.33:9042
[2022-04-27 15:13:12,494] DEBUG: Closed socket to 172.16.253.33:9042
[2022-04-27 15:13:12,494] DEBUG: Exception in read for <GeventConnection(140075216313648) 172.16.253.33:9042 (defunct)>: [Errno 9] Bad file descriptor
[2022-04-27 15:13:12,495] WARNING: Downgrading core protocol version from 66 to 65 for 172.16.253.33:9042. To avoid this, it is best practice to explicitly set Cluster(protocol_version) to the version supported by your cluster. http://datastax.github.io/python-driver/api/cassandra/cluster.html#cassandra.cluster.Cluster.protocol_version
[2022-04-27 15:13:12,495] DEBUG: Sending initial options message for new connection (140075216313088) to 172.16.253.33:9042
[2022-04-27 15:13:12,496] DEBUG: Defuncting connection (140075216313088) to 172.16.253.33:9042: <Error from server: code=000a [Protocol error] message="Invalid or unsupported protocol version (65); supported versions are (3/v3, 4/v4, 5/v5-beta)">
[2022-04-27 15:13:12,496] DEBUG: Closing connection (140075216313088) to 172.16.253.33:9042
[2022-04-27 15:13:12,496] DEBUG: Closed socket to 172.16.253.33:9042
[2022-04-27 15:13:12,496] DEBUG: Exception in read for <GeventConnection(140075216313088) 172.16.253.33:9042 (defunct)>: [Errno 9] Bad file descriptor
[2022-04-27 15:13:12,496] WARNING: Downgrading core protocol version from 65 to 5 for 172.16.253.33:9042. To avoid this, it is best practice to explicitly set Cluster(protocol_version) to the version supported by your cluster. http://datastax.github.io/python-driver/api/cassandra/cluster.html#cassandra.cluster.Cluster.protocol_version
[2022-04-27 15:13:12,497] DEBUG: Sending initial options message for new connection (140075216313368) to 172.16.253.33:9042
[2022-04-27 15:13:12,498] ERROR: Closing connection <GeventConnection(140075216313368) 172.16.253.33:9042> due to protocol error: Error from server: code=000a [Protocol error] message="Beta version of the protocol used (5/v5-beta), but USE_BETA flag is unset"
[2022-04-27 15:13:12,498] DEBUG: Defuncting connection (140075216313368) to 172.16.253.33:9042: <Error from server: code=000a [Protocol error] message="Beta version of the protocol used (5/v5-beta), but USE_BETA flag is unset">
[2022-04-27 15:13:12,499] DEBUG: Closing connection (140075216313368) to 172.16.253.33:9042
[2022-04-27 15:13:12,499] DEBUG: Closed socket to 172.16.253.33:9042
[2022-04-27 15:13:12,499] DEBUG: Exception in read for <GeventConnection(140075216313368) 172.16.253.33:9042 (defunct)>: [Errno 9] Bad file descriptor
[2022-04-27 15:13:12,499] WARNING: Downgrading core protocol version from 5 to 4 for 172.16.253.33:9042. To avoid this, it is best practice to explicitly set Cluster(protocol_version) to the version supported by your cluster. http://datastax.github.io/python-driver/api/cassandra/cluster.html#cassandra.cluster.Cluster.protocol_version
[2022-04-27 15:13:12,500] DEBUG: Sending initial options message for new connection (140075216313984) to 172.16.253.33:9042
[2022-04-27 15:13:12,503] DEBUG: Received options response on new connection (140075216313984) from 172.16.253.33:9042
[2022-04-27 15:13:12,504] DEBUG: No available compression types supported on both ends. locally supported: odict_keys([]). remotely supported: ['snappy', 'lz4']
[2022-04-27 15:13:12,504] DEBUG: Sending StartupMessage on <GeventConnection(140075216313984) 172.16.253.33:9042>
[2022-04-27 15:13:12,504] DEBUG: Sent StartupMessage on <GeventConnection(140075216313984) 172.16.253.33:9042>
[2022-04-27 15:13:12,505] DEBUG: Got AuthenticateMessage on new connection (140075216313984) from 172.16.253.33:9042: org.apache.cassandra.auth.PasswordAuthenticator
[2022-04-27 15:13:12,505] DEBUG: Sending SASL-based auth response on <GeventConnection(140075216313984) 172.16.253.33:9042>
[2022-04-27 15:13:12,602] DEBUG: Connection <GeventConnection(140075216313984) 172.16.253.33:9042> successfully authenticated
[2022-04-27 15:13:12,603] DEBUG: [control connection] Established new connection <GeventConnection(140075216313984) 172.16.253.33:9042>, registering watchers and refreshing schema and topology
[2022-04-27 15:13:12,612] DEBUG: [control connection] Refreshing node list and token map using preloaded results
[2022-04-27 15:13:12,613] INFO: Using datacenter 'tlb1' for DCAwareRoundRobinPolicy (via host '172.16.253.33:9042'); if incorrect, please specify a local_dc to the constructor, or limit contact points to local cluster nodes
[2022-04-27 15:13:12,613] DEBUG: [control connection] Found new host to connect to: 172.16.253.32:9042
[2022-04-27 15:13:12,613] INFO: New Cassandra host <Host: 172.16.253.32:9042 tlb1> discovered
[2022-04-27 15:13:12,613] DEBUG: Handling new host <Host: 172.16.253.32:9042 tlb1> and notifying listeners
[2022-04-27 15:13:12,614] DEBUG: Done preparing queries for new host <Host: 172.16.253.32:9042 tlb1>
[2022-04-27 15:13:12,614] DEBUG: Host 172.16.253.32:9042 is now marked up
[2022-04-27 15:13:12,614] DEBUG: [control connection] Found new host to connect to: 172.16.253.31:9042
[2022-04-27 15:13:12,614] INFO: New Cassandra host <Host: 172.16.253.31:9042 tlb1> discovered
[2022-04-27 15:13:12,614] DEBUG: Handling new host <Host: 172.16.253.31:9042 tlb1> and notifying listeners
[2022-04-27 15:13:12,614] DEBUG: Done preparing queries for new host <Host: 172.16.253.31:9042 tlb1>
[2022-04-27 15:13:12,615] DEBUG: Host 172.16.253.31:9042 is now marked up
[2022-04-27 15:13:12,615] DEBUG: [control connection] Found new host to connect to: 172.16.253.34:9042
[2022-04-27 15:13:12,615] INFO: New Cassandra host <Host: 172.16.253.34:9042 tlb1> discovered
[2022-04-27 15:13:12,615] DEBUG: Handling new host <Host: 172.16.253.34:9042 tlb1> and notifying listeners
[2022-04-27 15:13:12,615] DEBUG: Done preparing queries for new host <Host: 172.16.253.34:9042 tlb1>
[2022-04-27 15:13:12,615] DEBUG: Host 172.16.253.34:9042 is now marked up
[2022-04-27 15:13:12,615] DEBUG: [control connection] Finished fetching ring info
[2022-04-27 15:13:12,615] DEBUG: [control connection] Rebuilding token map due to topology changes
[2022-04-27 15:13:12,636] DEBUG: Control connection created
[2022-04-27 15:13:12,637] DEBUG: Initializing connection for host 172.16.253.33:9042
[2022-04-27 15:13:12,637] DEBUG: Initializing connection for host 172.16.253.32:9042
[2022-04-27 15:13:12,638] DEBUG: Sending initial options message for new connection (140075216325152) to 172.16.253.33:9042
[2022-04-27 15:13:12,640] DEBUG: Received options response on new connection (140075216325152) from 172.16.253.33:9042
[2022-04-27 15:13:12,640] DEBUG: No available compression types supported on both ends. locally supported: odict_keys([]). remotely supported: ['snappy', 'lz4']
[2022-04-27 15:13:12,640] DEBUG: Sending StartupMessage on <GeventConnection(140075216325152) 172.16.253.33:9042>
[2022-04-27 15:13:12,640] DEBUG: Sent StartupMessage on <GeventConnection(140075216325152) 172.16.253.33:9042>
[2022-04-27 15:13:12,642] DEBUG: Got AuthenticateMessage on new connection (140075216325152) from 172.16.253.33:9042: org.apache.cassandra.auth.PasswordAuthenticator
[2022-04-27 15:13:12,642] DEBUG: Sending SASL-based auth response on <GeventConnection(140075216325152) 172.16.253.33:9042>
[2022-04-27 15:13:12,645] DEBUG: Sending initial options message for new connection (140075215933224) to 172.16.253.32:9042
[2022-04-27 15:13:12,646] DEBUG: Received options response on new connection (140075215933224) from 172.16.253.32:9042
[2022-04-27 15:13:12,646] DEBUG: No available compression types supported on both ends. locally supported: odict_keys([]). remotely supported: ['snappy', 'lz4']
[2022-04-27 15:13:12,646] DEBUG: Sending StartupMessage on <GeventConnection(140075215933224) 172.16.253.32:9042>
[2022-04-27 15:13:12,646] DEBUG: Sent StartupMessage on <GeventConnection(140075215933224) 172.16.253.32:9042>
[2022-04-27 15:13:12,647] DEBUG: Got AuthenticateMessage on new connection (140075215933224) from 172.16.253.32:9042: org.apache.cassandra.auth.PasswordAuthenticator
[2022-04-27 15:13:12,647] DEBUG: Sending SASL-based auth response on <GeventConnection(140075215933224) 172.16.253.32:9042>
[2022-04-27 15:13:12,739] DEBUG: Connection <GeventConnection(140075216325152) 172.16.253.33:9042> successfully authenticated
[2022-04-27 15:13:12,739] DEBUG: Finished initializing connection for host 172.16.253.33:9042
[2022-04-27 15:13:12,739] DEBUG: Added pool for host 172.16.253.33:9042 to session
[2022-04-27 15:13:12,740] DEBUG: Initializing connection for host 172.16.253.31:9042
[2022-04-27 15:13:12,741] DEBUG: Sending initial options message for new connection (140075215931600) to 172.16.253.31:9042
[2022-04-27 15:13:12,742] DEBUG: Received options response on new connection (140075215931600) from 172.16.253.31:9042
[2022-04-27 15:13:12,742] DEBUG: No available compression types supported on both ends. locally supported: odict_keys([]). remotely supported: ['snappy', 'lz4']
[2022-04-27 15:13:12,742] DEBUG: Sending StartupMessage on <GeventConnection(140075215931600) 172.16.253.31:9042>
[2022-04-27 15:13:12,742] DEBUG: Sent StartupMessage on <GeventConnection(140075215931600) 172.16.253.31:9042>
[2022-04-27 15:13:12,743] DEBUG: Got AuthenticateMessage on new connection (140075215931600) from 172.16.253.31:9042: org.apache.cassandra.auth.PasswordAuthenticator
[2022-04-27 15:13:12,743] DEBUG: Sending SASL-based auth response on <GeventConnection(140075215931600) 172.16.253.31:9042>
[2022-04-27 15:13:12,747] DEBUG: Connection <GeventConnection(140075215933224) 172.16.253.32:9042> successfully authenticated
[2022-04-27 15:13:12,747] DEBUG: Finished initializing connection for host 172.16.253.32:9042
[2022-04-27 15:13:12,747] DEBUG: Added pool for host 172.16.253.32:9042 to session
[2022-04-27 15:13:12,747] DEBUG: Initializing connection for host 172.16.253.34:9042
[2022-04-27 15:13:12,748] DEBUG: Sending initial options message for new connection (140075215920936) to 172.16.253.34:9042
[2022-04-27 15:13:12,750] DEBUG: Received options response on new connection (140075215920936) from 172.16.253.34:9042
[2022-04-27 15:13:12,750] DEBUG: No available compression types supported on both ends. locally supported: odict_keys([]). remotely supported: ['snappy', 'lz4']
[2022-04-27 15:13:12,750] DEBUG: Sending StartupMessage on <GeventConnection(140075215920936) 172.16.253.34:9042>
[2022-04-27 15:13:12,750] DEBUG: Sent StartupMessage on <GeventConnection(140075215920936) 172.16.253.34:9042>
[2022-04-27 15:13:12,751] DEBUG: Got AuthenticateMessage on new connection (140075215920936) from 172.16.253.34:9042: org.apache.cassandra.auth.PasswordAuthenticator
[2022-04-27 15:13:12,751] DEBUG: Sending SASL-based auth response on <GeventConnection(140075215920936) 172.16.253.34:9042>
[2022-04-27 15:13:12,842] DEBUG: Connection <GeventConnection(140075215931600) 172.16.253.31:9042> successfully authenticated
[2022-04-27 15:13:12,842] DEBUG: Finished initializing connection for host 172.16.253.31:9042
[2022-04-27 15:13:12,843] DEBUG: Added pool for host 172.16.253.31:9042 to session
[2022-04-27 15:13:12,847] DEBUG: Connection <GeventConnection(140075215920936) 172.16.253.34:9042> successfully authenticated
[2022-04-27 15:13:12,848] DEBUG: Finished initializing connection for host 172.16.253.34:9042
[2022-04-27 15:13:12,848] DEBUG: Added pool for host 172.16.253.34:9042 to session
[2022-04-27 15:13:12,848] DEBUG: Not starting MonitorReporter thread for Insights; not supported by server version 3.11.12 on ControlConnection host 172.16.253.33:9042
[2022-04-27 15:13:12,848] DEBUG: Started Session with client_id bc5b9d88-c244-417e-8cef-c3c36a3fc7a4 and session_id 85a57b50-bdce-4347-a86e-6f394730fae9
[2022-04-27 15:13:12,848] DEBUG: Checking placement using dc and rack...
[2022-04-27 15:13:12,849] DEBUG: Resolved 172.16.253.33 to cassandranode03.tlb1.lab.net
[2022-04-27 15:13:12,850] DEBUG: Checking host 172.16.253.33 against 172.16.253.33/cassandranode03.tlb1.lab.net
[2022-04-27 15:13:12,851] DEBUG: Resolved 172.16.253.31 to cassandranode01.tlb1.lab.net
[2022-04-27 15:13:12,852] DEBUG: Resolved 172.16.253.32 to cassandranode02.tlb1.lab.net
[2022-04-27 15:13:12,852] DEBUG: Resolved 172.16.253.33 to cassandranode03.tlb1.lab.net
[2022-04-27 15:13:12,853] DEBUG: Resolved 172.16.253.34 to cassandranode04.tlb1.lab.net
[2022-04-27 15:13:12,853] DEBUG: Closing connection (140075216325152) to 172.16.253.33:9042
[2022-04-27 15:13:12,853] DEBUG: Closed socket to 172.16.253.33:9042
[2022-04-27 15:13:12,853] DEBUG: Closing connection (140075215933224) to 172.16.253.32:9042
[2022-04-27 15:13:12,853] DEBUG: Closed socket to 172.16.253.32:9042
[2022-04-27 15:13:12,854] DEBUG: Closing connection (140075215931600) to 172.16.253.31:9042
[2022-04-27 15:13:12,854] DEBUG: Closed socket to 172.16.253.31:9042
[2022-04-27 15:13:12,854] DEBUG: Closing connection (140075215920936) to 172.16.253.34:9042
[2022-04-27 15:13:12,854] DEBUG: Closed socket to 172.16.253.34:9042
[2022-04-27 15:13:12,854] DEBUG: Shutting down Cluster Scheduler
[2022-04-27 15:13:12,854] DEBUG: Shutting down control connection
[2022-04-27 15:13:12,854] DEBUG: Closing connection (140075216313984) to 172.16.253.33:9042
[2022-04-27 15:13:12,855] DEBUG: Closed socket to 172.16.253.33:9042
[2022-04-27 15:13:12,855] INFO: Creating snapshots on all nodes
[2022-04-27 15:13:12,855] INFO: Executing "nodetool snapshot -t medusa-manual1321" on following nodes ['cassandranode01.tlb1.lab.net', 'cassandranode02.tlb1.lab.net', 'cassandranode03.tlb1.lab.net', 'cassandranode04.tlb1.lab.net'] with a parallelism/pool size of 500
[2022-04-27 15:13:12,855] DEBUG: Batch #1: Running "nodetool snapshot -t medusa-manual1321" on nodes ['cassandranode01.tlb1.lab.net', 'cassandranode02.tlb1.lab.net', 'cassandranode03.tlb1.lab.net', 'cassandranode04.tlb1.lab.net'] parallelism of 4
[2022-04-27 15:13:12,856] DEBUG: _run_command with read timeout None
[2022-04-27 15:13:12,856] DEBUG: Make client request for host cassandranode01.tlb1.lab.net, (host_i, host) in clients: False
[2022-04-27 15:13:12,856] DEBUG: Connecting to cassandranode01.tlb1.lab.net:22
[2022-04-27 15:13:12,856] DEBUG: _run_command with read timeout None
[2022-04-27 15:13:12,857] DEBUG: Make client request for host cassandranode02.tlb1.lab.net, (host_i, host) in clients: False
[2022-04-27 15:13:12,857] DEBUG: Connecting to cassandranode02.tlb1.lab.net:22
[2022-04-27 15:13:12,857] DEBUG: _run_command with read timeout None
[2022-04-27 15:13:12,857] DEBUG: Make client request for host cassandranode03.tlb1.lab.net, (host_i, host) in clients: False
[2022-04-27 15:13:12,857] DEBUG: Connecting to cassandranode03.tlb1.lab.net:22
[2022-04-27 15:13:12,858] DEBUG: _run_command with read timeout None
[2022-04-27 15:13:12,858] DEBUG: Make client request for host cassandranode04.tlb1.lab.net, (host_i, host) in clients: False
[2022-04-27 15:13:12,858] DEBUG: Connecting to cassandranode04.tlb1.lab.net:22
[2022-04-27 15:13:12,859] DEBUG: Starting new session for [email protected]:22
[2022-04-27 15:13:12,859] DEBUG: Session started, connecting with existing socket
[2022-04-27 15:13:12,927] DEBUG: Agent auth failed with b"Access denied for 'publickey'. Authentication that can continue: publickey,gssapi-keyex,gssapi-with-mic,password", continuing with other authentication methods
[2022-04-27 15:13:12,928] DEBUG: Trying to authenticate with identity file /root/.ssh/id_rsa
[2022-04-27 15:13:12,941] DEBUG: Authentication with identity file /root/.ssh/id_rsa failed, continuing with other identities
[2022-04-27 15:13:12,941] DEBUG: Starting new session for [email protected]:22
[2022-04-27 15:13:12,941] DEBUG: Session started, connecting with existing socket
[2022-04-27 15:13:13,015] DEBUG: Authentication with SSH Agent succeeded.
[2022-04-27 15:13:13,015] DEBUG: Authentication completed successfully - setting session to non-blocking mode
[2022-04-27 15:13:13,015] DEBUG: Opening new channel on cassandranode02.tlb1.lab.net
[2022-04-27 15:13:13,015] DEBUG: Channel open session blocked, waiting on socket..
[2022-04-27 15:13:13,015] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,016] DEBUG: Starting new session for [email protected]:22
[2022-04-27 15:13:13,016] DEBUG: Session started, connecting with existing socket
[2022-04-27 15:13:13,091] DEBUG: Authentication with SSH Agent succeeded.
[2022-04-27 15:13:13,091] DEBUG: Authentication completed successfully - setting session to non-blocking mode
[2022-04-27 15:13:13,091] DEBUG: Opening new channel on cassandranode01.tlb1.lab.net
[2022-04-27 15:13:13,091] DEBUG: Channel open session blocked, waiting on socket..
[2022-04-27 15:13:13,091] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,091] DEBUG: Starting new session for [email protected]:22
[2022-04-27 15:13:13,092] DEBUG: Session started, connecting with existing socket
[2022-04-27 15:13:13,165] DEBUG: Authentication with SSH Agent succeeded.
[2022-04-27 15:13:13,166] DEBUG: Authentication completed successfully - setting session to non-blocking mode
[2022-04-27 15:13:13,166] DEBUG: Opening new channel on cassandranode04.tlb1.lab.net
[2022-04-27 15:13:13,166] DEBUG: Channel open session blocked, waiting on socket..
[2022-04-27 15:13:13,166] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,267] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,273] DEBUG: Starting output generator on channel <ssh.channel.Channel object at 0x7f65cd847dc8> for stdout
[2022-04-27 15:13:13,273] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,273] DEBUG: Starting output generator on channel <ssh.channel.Channel object at 0x7f65cd847dc8> for stderr
[2022-04-27 15:13:13,273] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,292] DEBUG: Channel open session blocked, waiting on socket..
[2022-04-27 15:13:13,293] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,369] DEBUG: Channel open session blocked, waiting on socket..
[2022-04-27 15:13:13,369] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,373] DEBUG: No data for stdout, waiting
[2022-04-27 15:13:13,374] DEBUG: No data for stderr, waiting
[2022-04-27 15:13:13,414] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,420] DEBUG: Starting output generator on channel <ssh.channel.Channel object at 0x7f65cd847630> for stdout
[2022-04-27 15:13:13,420] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,420] DEBUG: Starting output generator on channel <ssh.channel.Channel object at 0x7f65cd847630> for stderr
[2022-04-27 15:13:13,420] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,474] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,474] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,494] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,500] DEBUG: Starting output generator on channel <ssh.channel.Channel object at 0x7f65cd847288> for stdout
[2022-04-27 15:13:13,500] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,500] DEBUG: Starting output generator on channel <ssh.channel.Channel object at 0x7f65cd847288> for stderr
[2022-04-27 15:13:13,501] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,521] DEBUG: No data for stdout, waiting
[2022-04-27 15:13:13,521] DEBUG: No data for stderr, waiting
[2022-04-27 15:13:13,574] DEBUG: No data for stdout, waiting
[2022-04-27 15:13:13,575] DEBUG: No data for stderr, waiting
[2022-04-27 15:13:13,601] DEBUG: No data for stdout, waiting
[2022-04-27 15:13:13,601] DEBUG: No data for stderr, waiting
[2022-04-27 15:13:13,621] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,621] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,676] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,676] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,701] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,701] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,722] DEBUG: No data for stdout, waiting
[2022-04-27 15:13:13,722] DEBUG: No data for stderr, waiting
[2022-04-27 15:13:13,776] DEBUG: No data for stdout, waiting
[2022-04-27 15:13:13,776] DEBUG: No data for stderr, waiting
[2022-04-27 15:13:13,802] DEBUG: No data for stdout, waiting
[2022-04-27 15:13:13,802] DEBUG: No data for stderr, waiting
[2022-04-27 15:13:13,823] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,823] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,877] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,877] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,902] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,902] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,923] DEBUG: No data for stdout, waiting
[2022-04-27 15:13:13,924] DEBUG: No data for stderr, waiting
[2022-04-27 15:13:13,977] DEBUG: No data for stdout, waiting
[2022-04-27 15:13:13,977] DEBUG: No data for stderr, waiting
[2022-04-27 15:13:14,002] DEBUG: No data for stdout, waiting
[2022-04-27 15:13:14,003] DEBUG: No data for stderr, waiting
[2022-04-27 15:13:14,024] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:14,024] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:14,077] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:14,078] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:14,103] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:14,103] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:14,125] DEBUG: No data for stdout, waiting
[2022-04-27 15:13:14,125] DEBUG: No data for stderr, waiting
[2022-04-27 15:13:14,171] DEBUG: Writing 120 bytes to stdout buffer
[2022-04-27 15:13:14,172] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:14,172] DEBUG: No data for stderr, waiting
[2022-04-27 15:13:14,172] DEBUG: No data for stdout, waiting
[2022-04-27 15:13:14,204] DEBUG: No data for stdout, waiting
[2022-04-27 15:13:14,204] DEBUG: No data for stderr, waiting
[2022-04-27 15:13:14,225] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:14,226] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:14,273] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:14,273] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:14,273] DEBUG: Writing 3288 bytes to stderr buffer
[2022-04-27 15:13:14,274] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:14,274] DEBUG: Channel is at EOF trying to read stdout - reader exiting
[2022-04-27 15:13:14,305] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:14,305] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:14,312] DEBUG: Writing 120 bytes to stdout buffer
[2022-04-27 15:13:14,312] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:14,313] DEBUG: No data for stderr, waiting
[2022-04-27 15:13:14,313] DEBUG: No data for stdout, waiting
[2022-04-27 15:13:14,374] DEBUG: Channel is at EOF trying to read stderr - reader exiting
[2022-04-27 15:13:14,389] DEBUG: Writing 120 bytes to stdout buffer
[2022-04-27 15:13:14,389] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:14,389] DEBUG: No data for stderr, waiting
[2022-04-27 15:13:14,390] DEBUG: No data for stdout, waiting
[2022-04-27 15:13:14,414] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:14,414] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:14,414] DEBUG: Writing 3288 bytes to stderr buffer
[2022-04-27 15:13:14,414] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:14,414] DEBUG: Channel is at EOF trying to read stdout - reader exiting
[2022-04-27 15:13:14,490] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:14,490] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:14,490] DEBUG: Writing 3288 bytes to stderr buffer
[2022-04-27 15:13:14,490] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:14,491] DEBUG: Channel is at EOF trying to read stdout - reader exiting
[2022-04-27 15:13:14,515] DEBUG: Channel is at EOF trying to read stderr - reader exiting
[2022-04-27 15:13:14,591] DEBUG: Channel is at EOF trying to read stderr - reader exiting
[2022-04-27 15:13:17,946] DEBUG: Agent auth failed with b"Access denied for 'publickey'. Authentication that can continue: publickey,gssapi-keyex,gssapi-with-mic,password", continuing with other authentication methods
[2022-04-27 15:13:17,946] DEBUG: Trying to authenticate with identity file /root/.ssh/id_rsa
[2022-04-27 15:13:17,959] DEBUG: Authentication with identity file /root/.ssh/id_rsa failed, continuing with other identities
[2022-04-27 15:13:22,966] DEBUG: Agent auth failed with b"Access denied for 'publickey'. Authentication that can continue: publickey,gssapi-keyex,gssapi-with-mic,password", continuing with other authentication methods
[2022-04-27 15:13:22,966] DEBUG: Trying to authenticate with identity file /root/.ssh/id_rsa

Target node - /var/log/secure:

### target Cassandra node /etc/var/secure
Apr 27 15:13:13 cassandranode01 sshd[16492]: Accepted publickey for root from 172.16.253.33 port 33340 ssh2: RSA SHA256:{{REMOVED}}
Apr 27 15:13:13 cassandranode01 sshd[16492]: pam_unix(sshd:session): session opened for user root by (uid=0)
Apr 27 15:13:13 cassandranode01 sudo:    root : TTY=unknown ; PWD=/root ; USER=root ; COMMAND=/bin/bash -c nodetool snapshot -t medusa-manual1321
Apr 27 15:13:13 cassandranode01 sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Apr 27 15:13:14 cassandranode01 sudo: pam_unix(sudo:session): session closed for user root
Apr 27 15:13:54 cassandranode01 sshd[16492]: pam_unix(sshd:session): session closed for user root

Source node - /var/log/secure:

### Source Cassandra node /etc/var/secure
Apr 27 15:13:22 cassandranode03 sshd[2260]: error: maximum authentication attempts exceeded for root from 172.16.253.33 port 41896 ssh2 [preauth]
Apr 27 15:13:22 cassandranode03 sshd[2260]: Disconnecting: Too many authentication failures [preauth]

Cassandra.yaml:

### cassandra.yaml
# Cassandra storage config YAML


cluster_name: 'compass_cassandra01'

num_tokens: 256

hinted_handoff_enabled: true

max_hint_window_in_ms: 10800000 # 3 hours

hinted_handoff_throttle_in_kb: 1024

max_hints_delivery_threads: 2

hints_directory: /var/lib/cassandra/hints

hints_flush_period_in_ms: 10000

max_hints_file_size_in_mb: 128

batchlog_replay_throttle_in_kb: 1024

authenticator: PasswordAuthenticator

authorizer: CassandraAuthorizer

role_manager: CassandraRoleManager

roles_validity_in_ms: 2000

permissions_validity_in_ms: 2000

credentials_validity_in_ms: 2000

partitioner: org.apache.cassandra.dht.Murmur3Partitioner

data_file_directories:
    - /storage/lvm_cassdata01/cassandra/data

commitlog_directory: /storage/lvm_casslogs01/cassandra/commitlog

cdc_enabled: false

disk_failure_policy: stop

commit_failure_policy: stop

prepared_statements_cache_size_mb:

thrift_prepared_statements_cache_size_mb:

key_cache_size_in_mb:

key_cache_save_period: 14400

row_cache_size_in_mb: 0

row_cache_save_period: 0

counter_cache_size_in_mb:

counter_cache_save_period: 7200

saved_caches_directory: /var/lib/cassandra/saved_caches

commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000

commitlog_segment_size_in_mb: 32

seed_provider:
    # Addresses of hosts that are deemed contact points.
    # Cassandra nodes use this list of hosts to find each other and learn
    # the topology of the ring.  You must change this if you are running
    # multiple nodes!
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      parameters:
          # seeds is actually a comma-delimited list of addresses.
          # Ex: "<ip1>,<ip2>,<ip3>"
          - seeds: "172.16.253.31,172.16.253.32,172.16.253.33"

concurrent_reads: 32
concurrent_writes: 32
concurrent_counter_writes: 32

concurrent_materialized_view_writes: 32

memtable_allocation_type: heap_buffers

index_summary_capacity_in_mb:

index_summary_resize_interval_in_minutes: 60

trickle_fsync: false
trickle_fsync_interval_in_kb: 10240

storage_port: 7000

ssl_storage_port: 7001

listen_address: 172.16.253.33

start_native_transport: true

native_transport_port: 9042

start_rpc: false

rpc_address: 172.16.253.33

rpc_port: 9160

rpc_keepalive: true

rpc_server_type: sync

thrift_framed_transport_size_in_mb: 15

incremental_backups: false

snapshot_before_compaction: false

auto_snapshot: true

column_index_size_in_kb: 64

column_index_cache_size_in_kb: 2

compaction_throughput_mb_per_sec: 16

sstable_preemptive_open_interval_in_mb: 50

read_request_timeout_in_ms: 5000

range_request_timeout_in_ms: 10000

write_request_timeout_in_ms: 2000

counter_write_request_timeout_in_ms: 5000

cas_contention_timeout_in_ms: 1000

truncate_request_timeout_in_ms: 60000

request_timeout_in_ms: 10000

slow_query_log_timeout_in_ms: 500

cross_node_timeout: false

endpoint_snitch: GossipingPropertyFileSnitch

dynamic_snitch_update_interval_in_ms: 100

dynamic_snitch_reset_interval_in_ms: 600000

dynamic_snitch_badness_threshold: 0.1

request_scheduler: org.apache.cassandra.scheduler.NoScheduler

server_encryption_options:
    internode_encryption: none
    keystore: conf/.keystore
    keystore_password: {{REMOVED}}
    truststore: conf/.truststore
    truststore_password: {{REMOVED}}

client_encryption_options:
    enabled: false
    optional: false
    keystore: conf/.keystore
    keystore_password: {{REMOVED}}
    
internode_compression: dc

inter_dc_tcp_nodelay: false


tracetype_query_ttl: 86400
tracetype_repair_ttl: 604800

enable_user_defined_functions: false

enable_scripted_user_defined_functions: false

windows_timer_interval: 1

transparent_data_encryption_options:
    enabled: false
    chunk_length_kb: 64
    cipher: AES/CBC/PKCS5Padding
    key_alias: testing:1
    # CBC IV length for AES needs to be 16 bytes (which is also the default size)
    # iv_length: 16
    key_provider:
      - class_name: org.apache.cassandra.security.JKSKeyProvider
        parameters:
          - keystore: conf/.keystore
            keystore_password: {{REMOVED}}
            store_type: JCEKS
            key_password: {{REMOVED}}


tombstone_warn_threshold: 1000
tombstone_failure_threshold: 100000

replica_filtering_protection:
    
    cached_rows_warn_threshold: 2000
    cached_rows_fail_threshold: 32000


batch_size_warn_threshold_in_kb: 5

batch_size_fail_threshold_in_kb: 50

unlogged_batch_across_partitions_warn_threshold: 10

compaction_large_partition_warning_threshold_mb: 100

gc_warn_threshold_in_ms: 1000

back_pressure_enabled: false

back_pressure_strategy:
    - class_name: org.apache.cassandra.net.RateBasedBackPressure
      parameters:
        - high_ratio: 0.90
          factor: 5
          flow: FAST

enable_materialized_views: true

enable_sasi_indexes: true

Medusa.ini:

### cat /etc/medusa/medusa.ini

[cassandra]
;stop_cmd = /etc/init.d/cassandra stop
;start_cmd = /etc/init.d/cassandra start
config_file = /etc/cassandra/default.conf/cassandra.yaml
cql_username = cassandraadmin
cql_password = Th1nk0nLAB!
;nodetool_username =  <my nodetool username>
;nodetool_password =  <my nodetool password>
;nodetool_password_file_path = <path to nodetool password file>
;nodetool_host = <host name or IP to use for nodetool>
;nodetool_port = <port number to use for nodetool>
;certfile= <Client SSL: path to rootCa certificate>
;usercert= <Client SSL: path to user certificate>
;userkey= <Client SSL: path to user key>
;sstableloader_ts = <Client SSL: full path to truststore>
;sstableloader_tspw = <Client SSL: password of the truststore>
;sstableloader_ks = <Client SSL: full path to keystore>
;sstableloader_kspw = <Client SSL: password of the keystore>
;sstableloader_bin = <Location of the sstableloader binary if not in PATH>

; Enable this to add the '--ssl' parameter to nodetool. The nodetool-ssl.properties is expected to be in the normal location
;nodetool_ssl = true

; Command ran to verify if Cassandra is running on a node. Defaults to "nodetool version"
check_running = nodetool version

; Disable/Enable ip address resolving.
; Disabling this can help when fqdn resolving gives different domain names for local and remote nodes
; which makes backup succeed but Medusa sees them as incomplete.
; Defaults to True.
resolve_ip_addresses = True

; When true, almost all commands executed by Medusa are prefixed with `sudo`.
; Does not affect the use_sudo_for_restore setting in the 'storage' section.
; See https://github.com/thelastpickle/cassandra-medusa/issues/318
; Defaults to True
;use_sudo = True

[storage]
storage_provider = local
; storage_provider should be either of "local", "google_storage" or "s3"
region = <Region hosting the storage>

; Name of the bucket used for storing backups
bucket_name = cassandra_backups

; JSON key file for service account with access to GCS bucket or AWS credentials file (home-dir/.aws/credentials)
key_file = /etc/medusa/credentials

; Path of the local storage bucket (used only with 'local' storage provider)
base_path = /exports/compassfile01/lvm_backup01/backups/compasscass

; Any prefix used for multitenancy in the same bucket
prefix = tlb1.compass_cassandra01_rack01

;fqdn = <enforce the name of the local node. Computed automatically if not provided.>

; Number of days before backups are purged. 0 means backups don't get purged by age (default)
max_backup_age = 15
; Number of backups to retain. Older backups will get purged beyond that number. 0 means backups don't get purged by count (default)
max_backup_count = 0
; Both thresholds can be defined for backup purge.

; Used to throttle S3 backups/restores:
transfer_max_bandwidth = 50MB/s

; Max number of downloads/uploads. Not used by the GCS backend.
concurrent_transfers = 1

; Size over which S3 uploads will be using the awscli with multi part uploads. Defaults to 100MB.
multi_part_upload_threshold = 104857600

; GC grace period for backed up files. Prevents race conditions between purge and running backups
backup_grace_period_in_days = 10

; When not using sstableloader to restore data on a node, Medusa will copy snapshot files from a
; temporary location into the cassandra data directroy. Medusa will then attempt to change the
; ownership of the snapshot files so the cassandra user can access them.
; Depending on how users/file permissions are set up on the cassandra instance, the medusa user
; may need elevated permissions to manipulate the files in the cassandra data directory.
;
; This option does NOT replace the `use_sudo` option under the 'cassandra' section!
; See: https://github.com/thelastpickle/cassandra-medusa/pull/399
;
; Defaults to True
;use_sudo_for_restore = True

;api_profile = <AWS profile to use>

;host = <Optional object storage host to connect to>
;port = <Optional object storage port to connect to>

; Configures the use of SSL to connect to the object storage system.
;secure = True

;aws_cli_path = <Location of the aws cli binary if not in PATH>

[monitoring]
;monitoring_provider = <Provider used for sending metrics. Currently either of "ffwd" or "local">

[ssh]
;username = <SSH username to use for restoring clusters>
;key_file = <SSH key for use for restoring clusters. Expected in PEM unencrypted format.>
;port = <SSH port for use for restoring clusters. Default to port 22.
;cert_file = <Path of public key signed certificate file to use for authentication. The corresponding private key must also be provided via key_file parameter>

[checks]
;health_check = <Which ports to check when verifying a node restored properly. Options are 'cql' (default), 'thrift', 'all'.>
;query = <CQL query to run after a restore to verify it went OK>
;expected_rows = <Number of rows expected to be returned when the query runs. Not checked if not specified.>
;expected_result = <Coma separated string representation of values returned by the query. Checks only 1st row returned, and only if specified>
;enable_md5_checks = <During backups and verify, use md5 calculations to determine file integrity (in addition to size, which is used by default)>

[logging]
; Controls file logging, disabled by default.
enabled = 1
file = medusa.log
level = INFO

; Control the log output format
format = [%(asctime)s] %(levelname)s: %(message)s

; Size over which log file will rotate
maxBytes = 20000000

; How many log files to keep
backupCount = 30

[grpc]
; Set to true when running in grpc server mode.
; Allows to propagate the exceptions instead of exiting the program.
;enabled = False

[kubernetes]
; The following settings are only intended to be configured if Medusa is running in containers, preferably in Kubernetes.
;enabled = False
;cassandra_url = <URL of the management API snapshot endpoint. For example: http://127.0.0.1:8080/api/v0/ops/node/snapshots>

; Enables the use of the management API to create snapshots. Falls back to using Jolokia if not enabled.
;use_mgmt_api = True

┆Issue is synchronized with this Jira Task by Unito ┆friendlyId: K8SSAND-1480 ┆priority: Medium

tlb1galaxy avatar Apr 27 '22 20:04 tlb1galaxy

Possible Resolution: After spending a bunch of time on this, I finally have gotten the 'backup-cluster' function to work. Here are the conditions I had to implement to get this to work.

SSH-agent and forwarding:

  • Not required
    • Unless you want to utilize not leaving SSH keys on host. I removed this and implemented the key into the medusa.ini (look further down for details)

SSH key-auth:

  • All nodes must be able to SSH-key-auth into all other node including themselves
    • Medusa (backup-cluster) is run (backup initialized) from one of the nodes. So essentially the source medusa node needs to be able to SSH-auth to itself

SUDOERS - secure_path: Reference an existing issue: - issue#253

Have to modify the line in /etc/sudoers via visudo add the 2 following paths to 'secure_paths'

  • /usr/local/bin
  • /opt/cassandra/bin
# Adding HOME to env_keep may enable a user to run unrestricted
# commands via sudo.
#
# Defaults   env_keep += "HOME"

Defaults    secure_path = /sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin:/opt/cassandra/bin

/etc/medusa/medusa.ini - handle SSH keys: The default example medusa.ini has 2 keys with the same name 'key_file'

  • [storage] - key_file
  • [ssh] - key_file You need to ensure only one is active:
[storage]
storage_provider = local
; storage_provider should be either of "local", "google_storage" or "s3"
; region = <Region hosting the storage>

; Name of the bucket used for storing backups
bucket_name = cassandra_backups

; JSON key file for service account with access to GCS bucket or AWS credentials file (home-dir/.aws/credentials)
; key_file = /etc/medusa/credentials

; Path of the local storage bucket (used only with 'local' storage provider)
base_path = /exports/compassfile01/lvm_backup01/backups/compasscass

; Any prefix used for multitenancy in the same bucket
prefix = tlb1.compass_cassandra01_rack01

;fqdn = <enforce the name of the local node. Computed automatically if not provided.>

; Number of days before backups are purged. 0 means backups don't get purged by age (default)
max_backup_age = 15
; Number of backups to retain. Older backups will get purged beyond that number. 0 means backups don't get purged by count (default)
max_backup_count = 0
; Both thresholds can be defined for backup purge.

; Used to throttle S3 backups/restores:
transfer_max_bandwidth = 50MB/s

; Max number of downloads/uploads. Not used by the GCS backend.
concurrent_transfers = 1

; Size over which S3 uploads will be using the awscli with multi part uploads. Defaults to 100MB.
multi_part_upload_threshold = 104857600

; GC grace period for backed up files. Prevents race conditions between purge and running backups
backup_grace_period_in_days = 10

[ssh]
username = root
key_file = /root/.ssh/id_rsa
;port = <SSH port for use for restoring clusters. Default to port 22.
;cert_file = <Path of public key signed certificate file to use for authentication. The corresponding private key must also be provided via key_file parameter>

tlb1galaxy avatar Apr 28 '22 18:04 tlb1galaxy

I was not able to reproduce this.

The [ssh] section needs the username/password. It might be nice to defualt to $USER and ~/.ssh/id_rsa, but that's perhaps for another issue.

The clash of cassandra/key_file and ssh/key_file does not seem to be a thing either.

rzvoncek avatar Apr 04 '24 12:04 rzvoncek