Israel Fruchter
Israel Fruchter
@rzetelskik @vponomaryov @slivne, here's a dtest that reproduces it with blocking gossip traffic TO the node being decommissioned https://gist.github.com/fruch/7487febf59afd60a7972a08e4ef44daf the decomission starts on node `127.0.77.3` when it's doesn't see the...
> @fruch to further specify my comment above - I don't think it's enough to introduce such iptables rule to reproduce this exact issue. I don't have enough expertise to...
@rzetelskik I was blocking only packet targeting the gossip port on the node being decommissioned, meaning all other communication from that node to the other node is working. so nodes...
> > I was blocking only packet targeting the gossip port on the node being decommissioned, meaning all other communication from that node to the other node is working. so...
> > I'm blocking node1, node2 --> 7000 node3 > > ... > > so the node_ops command can go from node3 to node1 and node2, > > Yes, node_ops...
@rzetelskik question is, if scylla-core going to be handling this, or operator would need to wait for cluster fully up (i.e. all nodes see each other) before issuing a decommission...
> Thanks for reporting this. > > > the installation instruction might reflect the situation, on how where to get collectd on ubuntu. > > Which part in your opinion...
> Note that it would be much more helpful to comment on the Ubuntu bug tracker as this is Ubuntu issue, not a collectd one. And maybe ask Ubuntu bug...
@asias we are now seeing this across the board in the master upgrade tests: https://jenkins.scylladb.com/job/scylla-master/job/rolling-upgrade/job/rolling-upgrade-debian10-test/148/ https://jenkins.scylladb.com/job/scylla-master/job/rolling-upgrade/job/rolling-upgrade-debian11-test/12/ https://jenkins.scylladb.com/job/scylla-master/job/rolling-upgrade/job/rolling-upgrade-ubuntu20.04-test/88/ https://jenkins.scylladb.com/job/scylla-master/job/rolling-upgrade/job/rolling-upgrade-ubuntu18.04-test/216/
@asias, @fgelcer, @yarongilor Just adding some information I've noticed, and isn't mention here on the bug The actual failure on node1 ``` Sep 17 06:03:51 rolling-upgrade--ubuntu-focal-db-node-96c53cba-0-1 scylla[36181]: [shard 0] repair...