cassandra-medusa
cassandra-medusa copied to clipboard
restore-cluster command shows error during execution
We see an error during execution of restore-cluster command. The error meesage is
[2022-07-13 04:48:55,164] ERROR: Host 'node1-dc1.abcd.com' is up, but port '9042' is closed.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/medusa/cassandra_utils.py", line 715, in is_open
s.connect((host, port))
File "/usr/local/lib64/python3.6/site-packages/gevent/_socketcommon.py", line 607, in connect
raise _SocketError(err, strerror(err))
ConnectionRefusedError: [Errno 111] Connection refused
[2022-07-13 04:48:55,166] ERROR: Host 'node1-dc1.abcd.com' is up, but port '7001' is closed.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/medusa/cassandra_utils.py", line 715, in is_open
s.connect((host, port))
File "/usr/local/lib64/python3.6/site-packages/gevent/_socketcommon.py", line 607, in connect
raise _SocketError(err, strerror(err))
ConnectionRefusedError: [Errno 111] Connection refused
It seems that in context of restore-cluster this could be a debug level log. Ref: https://github.com/thelastpickle/cassandra-medusa/blob/master/medusa/cassandra_utils.py#L723
Can you please look into the same and let us know the reason of error. Do let us know if there is any info in needed.
┆Issue is synchronized with this Jira Task by Unito ┆Issue Number: K8SSAND-1648 ┆priority: Medium ┆Link To Issue: https://k8ssandra.atlassian.net/browse/K8SSAND-1648
I agree debug level is more appropriate, considering the calling function uses debug logging in case the above happens. Maybe this should be moved to info level then, in order to make it obvious to ops what's going on.
Hi @adejanovski - Thanks for your response.... the change will definitely assist the ops team. Hope it gets into code soon. Regards, Kaushal
Closing due to inactivity. Please reach out if this is still bugging you.
I just had a customer bring this up. It may be worthwhile to change from logging the ConnectionRefusedError
and socket.error
exceptions to debug level instead of error level. The behaviour seems to be expected based on the comments here: https://github.com/thelastpickle/cassandra-medusa/blob/master/medusa/cassandra_utils.py#L819-L825
I think the is_open
function is just being used to check if Cassandra is down before restoring. If so, it should be safe to modify the logging message to indicate that the health check indicates that Cassandra is successfully down.
Hi @philipfischbacher ! Great that you have reached out. I've been unhappy about these error messages myself. I'll reopen the issue, rephrase the ask a bit, and ammend the initial issue. Hopefully, we'll get to do this soon.
Hi @rzvoncek, thanks for the quick reply! If I can manage it, I can try to create a PR for this change.