Kamil Braun comments

Results 302 comments of


                                            Kamil Braun

test.py: teardown may race with ongoing test operation, causing IPs to be released prematurely

@temichus I managed to create a reproducer https://github.com/kbr-scylla/scylladb/commit/9551931e6ef2ac98112643e5a50deaacb2c2dfb5 The test is simple ```python3 from test.pylib.manager_client import ManagerClient import pytest import logging import asyncio logger = logging.getLogger(__name__) @pytest.mark.asyncio async def test_concurrent_tasks(manager:...

test.py: teardown may race with ongoing test operation, causing IPs to be released prematurely

> And I honestly don't know how to handle this situation in ScyllaServer/ScyllaCluster/ScyllaClusterManager because even if we try to store all active tasks in classes in order to wait for...

test.py: teardown may race with ongoing test operation, causing IPs to be released prematurely

And even if creating a task required a yield, we'd just need to introduce a lock around checking/setting flag and task creation in that case. (But we shouldn't need a...

dtest: topology_test.TestTopology.test_crash_during_decommission got unexpected errors: raft::request_aborted (Request is aborted by a caller)

> But according to @kbr-scylla there is a bigger problem with this than just the exception, it has unbounded run time. Yes. In the logs I saw over 80 retries...

dtest: topology_test.TestTopology.test_crash_during_decommission got unexpected errors: raft::request_aborted (Request is aborted by a caller)

@enaydanov I agree with @gleb-cloudius 's suggestion. This IIUC should also cause this error to stop appearing.

dtest: topology_test.TestTopology.test_crash_during_decommission got unexpected errors: raft::request_aborted (Request is aborted by a caller)

Nice, thanks for the research. In that case, @enaydanov, we can simply limit the number of crashes of `node2`. Maybe to something like 10? In gossip mode, the decommission will...

dtest: topology_test.TestTopology.test_crash_during_decommission got unexpected errors: raft::request_aborted (Request is aborted by a caller)

Unassigning from myself --- test needs adjustment

Sporadic repair failure when adding a node

> I'm just guessing here, but could we have a race where node B thinks it should repair a table (system_auth.roles) from node A, but node A doesn't have this...

Sporadic repair failure when adding a node

The out_of_range error appears in the "Repair follower=127.117.92.15" logs > WARN 2024-01-17 14:52:00,627 [shard 0: gms] repair - repair[a46545ed-cdcf-484f-855c-110ff142729d]: put_row_diff: got error from node=127.117.92.8, keyspace=system_auth, table=roles, range=(97021643797111883,497366115063960576], error=std::runtime_error (put_row_diff: Repair...

Sporadic repair failure when adding a node

Not so long ago there was a big change in repair 697cf41b9b123ad3afbd8c2ab029275329f6b11e maybe it's related.