scylla-cluster-tests
scylla-cluster-tests copied to clipboard
`disrupt_add_remove_dc` nemesis breaks `disrupt_add_drop_column` one running in parallel
Issue description
- [ ] This issue is a regression.
- [ ] It is unknown if this issue is a regression.
If we run disrupt_add_remove_dc
nemesis in parallel to the disrupt_add_drop_column
one then we can get following error:
Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 5116, in wrapper
result = method(*args[1:], **kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2449, in disrupt_add_drop_column
self._add_drop_column_run_in_cycle()
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2178, in _add_drop_column_run_in_cycle
self._add_drop_column()
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2144, in _add_drop_column
self._add_drop_column_target_table = self._add_drop_column_get_target_table(
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2069, in _add_drop_column_get_target_table
current_tables = self._get_all_tables_with_no_compact_storage(self._add_drop_column_tables_to_ignore)
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2060, in _get_all_tables_with_no_compact_storage
tables = get_db_tables(session, ks, with_compact_storage=False)
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/common.py", line 2101, in get_db_tables
for table in list(session.cluster.metadata.keyspaces[ks].tables.keys()):
KeyError: 'keyspace_new_dc'
It is caused by the concurrency of a new keyspace addition (disrupt_add_remove_dc nemesis
)
and driver session update in addition to the unsafe coding assuming driver's session (disrupt_add_drop_column
nemesis) knows about that newly added keyspace.
Steps to Reproduce
- Run any longevity where above 2 mentioned nemesis run in parallel
- See error
Expected behavior: The disrupt_add_drop_column
should not fail with the KeyError: 'keyspace_new_dc'
error.
Actual behavior: KeyError: 'keyspace_new_dc'
error in scope of the disrupt_add_drop_column
nemesis running in parallel to the disrupt_add_remove_dc
one.
Impact
How frequently does it reproduce?
Installation details
SCT Version: master Scylla version (or git commit hash): master/any
Logs
- test_id: b3392b32-2b77-4a85-bf64-40a8a9e595e2
- job log: scylla-master/tier1/longevity-schema-topology-changes-12h-test
So we need to refresh the session, and fall back to the next keyspace if we fail to find it.
https://github.com/scylladb/scylla-cluster-tests/pull/7565