Roy Dahan comments

Results 453 comments of


                                            Roy Dahan

take_snapshot failed: std::runtime_error (Keyspace system_schema: snapshot sm_20220929153521UTC already exists - snap. creation takes>5m

I can move this issue to scylla-manager and @tzach can define there if increasing the timeout is needed and by how much.

take_snapshot failed: std::runtime_error (Keyspace system_schema: snapshot sm_20220929153521UTC already exists - snap. creation takes>5m

Actually, I can't move, I think I don't have enough permissions there.

take_snapshot failed: std::runtime_error (Keyspace system_schema: snapshot sm_20220929153521UTC already exists - snap. creation takes>5m

I agree.

take_snapshot failed: std::runtime_error (Keyspace system_schema: snapshot sm_20220929153521UTC already exists - snap. creation takes>5m

I'm not sure if it's a transient issue that SCT should ignore or a real problem with manager / scylla. I have a job that reproduces the issue quite consistently....

After a successful schema and data restoration to a different region, the restored keyspace is completely empty

Still happens: https://argus.scylladb.com/test/f1ff65fd-8324-4264-8d28-8c7122fca836/runs?additionalRuns[]=5986619f-8479-4267-a92f-19c6b604f84b

repair fails after 2.5 hours on enterprise performance tests

Should we have an assignee here?

repair fails after 2.5 hours on enterprise performance tests

I don't know why, but the option of "Transfer Issue" isn't available for Issues in this repo.

One of the nodes fails with `No space left on device` during manager-related nemeses in 24-hour Cloud longevity test

It can't be the logs because it happens only on one node and the pattern is that space is going up but also down. @ilya-rarov please correlate the used space...

One of the nodes fails with `No space left on device` during manager-related nemeses in 24-hour Cloud longevity test

@ilya-rarov it doesn't help just throwing more and more information from reproducers. We need to find an exact reproducer. It seems like the reproducer is the backup nemesis, not repair....

One of the nodes fails with `No space left on device` during manager-related nemeses in 24-hour Cloud longevity test

I don't know. Maybe the manager does its operations sequentially on the nodes and not in parallel. Another way to debug it is to keep the cluster alive and check...