Kamil Braun
Kamil Braun
@rohitraj-carousellgroup you've got 3 separate clusters in there, connected into one. You can see that by `group_id`, 3 different values are appearing there. There should be only one `group_id` in...
> Anyway, the only way I currently see to fix your cluster is to perform the manual Raft recovery procedure: In your special case there are no dead nodes, so...
Manual Raft recovery procedure will not affect availability of queries, assuming that: - you're using RF >= 3 - you're using CL
@denesb I don't know if this is the root cause, but we need to replace all `cql.execute` with `cql.run_async` which has timeouts adjusted for the slow ARM environments: ```python3 #...
Similar symptoms to https://github.com/scylladb/scylladb/issues/16668. In that issue, a node finishes replace operation but never seemed to transition to NORMAL from the point of view of other nodes. I suspected that...
In fact this "Node 10.4.10.195 is in normal state" which we only see on 4 nodes and for some reason not on others is coming from https://github.com/scylladb/scylladb/commit/5f44ae8371a382e51eb4bd7aa85628722bd528b6 which I sent...
But apparently my suspicion about gossiper waiting on a lock is also false, we don't see "waiting for endpoint lock" in the logs.
Maybe node 3 gets the gossiper notification but it got broken somehow, with Host ID = null (as evidenced by the nodetool status output) and also STATUS = null
I'm looking for clues in git log. There were some non-trivial changes in how gossiper notifications are delivered such as ad8a9104d859124447d847eeecf9e32c24da9e7c, in December, could be related
The reason Host ID is shown as null for the joining node is because `nodetool status` uses the `/storage_service/host_id` HTTP endpoint, which uses `token_metadata::get_endpoint_to_host_id_map_for_reading()` underneath. If a node is not...