cassandra-medusa
cassandra-medusa copied to clipboard
[WIP] Sanity checking around backing up files
While trying to deploy Medusa, i ran into a few issues. The root cause of #390 was an issue file systems permissions and the silent-failure property of python's glob()
This PR implements some of the debug logging that I wish I had had while troubleshooting and a basic sanity check to abort execution as soon as an error is observed rather than waiting for a (misleading) failure at a later point in execution.
┆Issue is synchronized with this Jira Task by Unito ┆friendlyId: K8SSAND-1398 ┆priority: Medium
Would ValueError
be more appropriate than the base Exception
that's being raised? I didn't see too many custom exception types in the Medusa code base and of the few, none really fit this case. There are several places in the code bases where raise Exception(error_msg)
is used so i'm not sure if there's some lint-disable comment i can leave there to prevent SonarCloud from objecting.
Kudos, SonarCloud Quality Gate passed!
0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells
No Coverage information
0.0% Duplication
@kiddom-kq, it looks like this broke the integration tests. You can run them locally using ./run_integration_tests.sh
, with a prereq to have ccm installed (pip install ccm
).
Let me know if you need some assistance to debug this.
Would
ValueError
be more appropriate than the baseException
that's being raised? I didn't see too many custom exception types in the Medusa code base and of the few, none really fit this case. There are several places in the code bases whereraise Exception(error_msg)
is used so i'm not sure if there's some lint-disable comment i can leave there to prevent SonarCloud from objecting.
Any improvement getting a more specific use of exceptions is welcome. Feel free to make any change necessary.
@adejanovski
I am having some trouble understanding the test.
./run_integration_tests.sh --test=16 --cassandra-version=2.2.19 -vv
That returns a failure because the assert
statement is thrown.
What is supposed to happen when find_dirs()
has no dirs?
Looking at the test:
Scenario Outline: Perform a differential backup over gRPC , verify its index, then delete it over gRPC with Jolokia
Given I have a fresh ccm cluster with jolokia "<client encryption>" running named "scenario16"
Given I am using "<storage>" as storage provider in ccm cluster "<client encryption>" with gRPC server
Then the gRPC server is up
When I create the "test" table in keyspace "medusa"
When I load 100 rows in the "medusa.test" table
When I run a "ccm node1 nodetool flush" command
When I perform a backup over gRPC in "differential" mode of the node named "grpc_backup"
Then the backup index exists
Then I verify over gRPC that the backup "grpc_backup" exists
Then I can see the backup index entry for "grpc_backup"
Then I can see the latest backup for "127.0.0.1" being called "grpc_backup"
Then I delete the backup "grpc_backup" over gRPC
Then I verify over gRPC the backup "grpc_backup" does not exist
Then I shutdown the gRPC server
It looks like When I perform a backup over gRPC in "differential" mode of the node named "grpc_backup"
is the first time that a backup is taken of the scenario16
cluster. If that is correct, how can a table with 100 rows in it have no directories that need to be backed up?
When I disable the assert, the execution continues to call add_backup_finish_to_index(storage, node_backup)
and then set_latest_backup_in_index(storage, node_backup)
Those functions put a record of the backup happening in the storage/index... even though nothing was backed up.
The next line of the test: Then the backup index exists
would see that something was added to the index and the test continues on happy to delete the (empty) backup that it just made.
Can you confirm that the differential
backup should return 0 directories?
@kiddom-kq, differential backups put the sstables into a data
directory that's at the same level than the backup directories. So you should have something like this:
While full backups use this layout:
Those 100 rows should definitely be backed up in the data
directory.
Hi @kiddom-kq, is this still something you're working on?
ping @kiddom-kq