cassandra-medusa icon indicating copy to clipboard operation
cassandra-medusa copied to clipboard

restore-cluster command shows error during execution

Open kaushalkumar opened this issue 2 years ago • 2 comments

We see an error during execution of restore-cluster command. The error meesage is

[2022-07-13 04:48:55,164] ERROR: Host 'node1-dc1.abcd.com' is up, but port '9042' is closed.
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/medusa/cassandra_utils.py", line 715, in is_open
    s.connect((host, port))
  File "/usr/local/lib64/python3.6/site-packages/gevent/_socketcommon.py", line 607, in connect
    raise _SocketError(err, strerror(err))
ConnectionRefusedError: [Errno 111] Connection refused
[2022-07-13 04:48:55,166] ERROR: Host 'node1-dc1.abcd.com' is up, but port '7001' is closed.
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/medusa/cassandra_utils.py", line 715, in is_open
    s.connect((host, port))
  File "/usr/local/lib64/python3.6/site-packages/gevent/_socketcommon.py", line 607, in connect
    raise _SocketError(err, strerror(err))
ConnectionRefusedError: [Errno 111] Connection refused

It seems that in context of restore-cluster this could be a debug level log. Ref: https://github.com/thelastpickle/cassandra-medusa/blob/master/medusa/cassandra_utils.py#L723

Can you please look into the same and let us know the reason of error. Do let us know if there is any info in needed.

┆Issue is synchronized with this Jira Task by Unito ┆Issue Number: K8SSAND-1648 ┆priority: Medium ┆Link To Issue: https://k8ssandra.atlassian.net/browse/K8SSAND-1648

kaushalkumar avatar Jul 13 '22 10:07 kaushalkumar

I agree debug level is more appropriate, considering the calling function uses debug logging in case the above happens. Maybe this should be moved to info level then, in order to make it obvious to ops what's going on.

adejanovski avatar Jul 25 '22 12:07 adejanovski

Hi @adejanovski - Thanks for your response.... the change will definitely assist the ops team. Hope it gets into code soon. Regards, Kaushal

kaushalkumar avatar Jul 27 '22 05:07 kaushalkumar

Closing due to inactivity. Please reach out if this is still bugging you.

rzvoncek avatar Mar 05 '24 10:03 rzvoncek

I just had a customer bring this up. It may be worthwhile to change from logging the ConnectionRefusedError and socket.error exceptions to debug level instead of error level. The behaviour seems to be expected based on the comments here: https://github.com/thelastpickle/cassandra-medusa/blob/master/medusa/cassandra_utils.py#L819-L825

I think the is_open function is just being used to check if Cassandra is down before restoring. If so, it should be safe to modify the logging message to indicate that the health check indicates that Cassandra is successfully down.

philipfischbacher avatar Mar 06 '24 07:03 philipfischbacher

Hi @philipfischbacher ! Great that you have reached out. I've been unhappy about these error messages myself. I'll reopen the issue, rephrase the ask a bit, and ammend the initial issue. Hopefully, we'll get to do this soon.

rzvoncek avatar Mar 06 '24 11:03 rzvoncek

Hi @rzvoncek, thanks for the quick reply! If I can manage it, I can try to create a PR for this change.

philipfischbacher avatar Mar 07 '24 08:03 philipfischbacher