scylla-cluster-tests icon indicating copy to clipboard operation
scylla-cluster-tests copied to clipboard

`disrupt_add_remove_dc` nemesis breaks `disrupt_add_drop_column` one running in parallel

Open vponomaryov opened this issue 11 months ago • 1 comments

Issue description

  • [ ] This issue is a regression.
  • [ ] It is unknown if this issue is a regression.

If we run disrupt_add_remove_dc nemesis in parallel to the disrupt_add_drop_column one then we can get following error:

Traceback (most recent call last):
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 5116, in wrapper
    result = method(*args[1:], **kwargs)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2449, in disrupt_add_drop_column
    self._add_drop_column_run_in_cycle()
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2178, in _add_drop_column_run_in_cycle
    self._add_drop_column()
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2144, in _add_drop_column
    self._add_drop_column_target_table = self._add_drop_column_get_target_table(
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2069, in _add_drop_column_get_target_table
    current_tables = self._get_all_tables_with_no_compact_storage(self._add_drop_column_tables_to_ignore)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2060, in _get_all_tables_with_no_compact_storage
    tables = get_db_tables(session, ks, with_compact_storage=False)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/common.py", line 2101, in get_db_tables
    for table in list(session.cluster.metadata.keyspaces[ks].tables.keys()):
KeyError: 'keyspace_new_dc'

It is caused by the concurrency of a new keyspace addition (disrupt_add_remove_dc nemesis) and driver session update in addition to the unsafe coding assuming driver's session (disrupt_add_drop_column nemesis) knows about that newly added keyspace.

Steps to Reproduce

  1. Run any longevity where above 2 mentioned nemesis run in parallel
  2. See error

Expected behavior: The disrupt_add_drop_column should not fail with the KeyError: 'keyspace_new_dc' error.

Actual behavior: KeyError: 'keyspace_new_dc' error in scope of the disrupt_add_drop_column nemesis running in parallel to the disrupt_add_remove_dc one.

Impact

How frequently does it reproduce?

Installation details

SCT Version: master Scylla version (or git commit hash): master/any

Logs

vponomaryov avatar Mar 04 '24 12:03 vponomaryov

So we need to refresh the session, and fall back to the next keyspace if we fail to find it.

fruch avatar Mar 04 '24 19:03 fruch

https://github.com/scylladb/scylla-cluster-tests/pull/7565

juliayakovlev avatar Jun 05 '24 07:06 juliayakovlev