scylla-cluster-tests
scylla-cluster-tests copied to clipboard
fix(nemesis): skip the `mgmt_restore` nemesis as unstable
Testing
- [ ]
PR pre-checks (self review)
- [x] I added the relevant
backportlabels - [x] I didn't leave commented-out/debugging code
Reminders
- Add New configuration option and document them (in
sdcm/sct_config.py) - Add unit tests to cover my changes (under
unit-test/folder) - Update the Readme/doc folder relevent to this change (if needed)
@mykaul & @tzach please approve this change. It's going to drop the current coverage of mgmt_restore nemesis that creates instability in regression testing.
Someone from manager team probably needs to map all the cases this nemesis can/may fail and decide how to proceed. For example, restoring OSS to Enterprise - should work or not and how far back? (only matching releases e.g. 5.4 to 2024.1 or also 5.2 to 2024.1?) Restoring enterprise to enterprise and how far back. Restoring non-encrypted to fully encrypted cluster Restoring backup that was done in one region to cluster that runs in another region (restore works, but the reads will fail until one will understand to alter the keyspace to the correct region) Etc.
The problem is not just the test - we have a real regression in restore we need to fix in 5.4 The test should be fixed to capture this issue.
The test is what found the problem and it's one of the edge cases I mentioned above.
However, there are too many problems that the test finds with regard to all the questions above and there is one general issue about it for the test failure (Currently assigned to @rayakurl).
@roydahan, @vponomaryov - @dkropachev is already working on fixing the problem with the nemesis. You can monitor the progress in https://github.com/scylladb/scylla-cluster-tests/pull/7029. This PR should be closed.
@roydahan, @vponomaryov - @dkropachev is already working on fixing the problem with the nemesis. You can monitor the progress in #7029. This PR should be closed.
Let's merge https://github.com/scylladb/scylla-cluster-tests/pull/7029 instead of disabling it, it will make it work for scylla up to 5.2.
For scylla 5.4 restore procedure is not working due to https://github.com/scylladb/scylladb/issues/16349, and probably you would want to disable it for 5.4.
@roydahan, @vponomaryov - @dkropachev is already working on fixing the problem with the nemesis. You can monitor the progress in #7029. This PR should be closed.
Let's merge https://github.com/scylladb/scylla-cluster-tests/pull/7029 instead of disabling it, it will make it work for scylla up to
5.2.For scylla
5.4restore procedure is not working due to https://github.com/scylladb/scylladb/issues/16349, and probably you would want to disable it for5.4.
If so we do need to disable it on master FYI it never reached the 5.2 branch, it wasn't ready when it started.
If it's broken for 5.4 and 2024.1, we should disable it until proven working
Need also to consider following bug:
- https://github.com/scylladb/scylla-cluster-tests/issues/7122
It may really affect the results based on the used mgmt snapshot for the restore operation.