scylla-manager
scylla-manager copied to clipboard
Restoring a 3.1 backup with 3.2 fails, with files from the bucket appearing to be missing
Issue description
- [x] This issue is a regression.
- [ ] It is unknown if this issue is a regression.
At first, while using manager 3.1.2, we create a backupm which was successful:
< t:2023-08-16 19:54:15,367 f:base.py l:142 c:RemoteLibSSH2CmdRunner p:DEBUG > Command "sudo sctool backup -c 4105638a-e5a7-4c68-b8c2-614904a59131 --keyspace keyspace1 --location s3:manager-backup-tests-us-east-1 --cron '56 19 * * *' " finished with status 0
...
< t:2023-08-16 19:56:24,300 f:remote_base.py l:520 c:RemoteLibSSH2CmdRunner p:DEBUG > Running command "sudo sctool -c 4105638a-e5a7-4c68-b8c2-614904a59131 progress backup/6d118b0f-2652-4258-a797-9a191b0b26f2"...
< t:2023-08-16 19:56:24,451 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > Run: ec1624fd-3c6e-11ee-bd30-02b041669e61
< t:2023-08-16 19:56:24,451 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > Status: DONE
< t:2023-08-16 19:56:24,451 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > Start time: 16 Aug 23 19:56:00 UTC
< t:2023-08-16 19:56:24,451 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > End time: 16 Aug 23 19:56:12 UTC
< t:2023-08-16 19:56:24,451 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > Duration: 12s
< t:2023-08-16 19:56:24,451 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > Progress: 100%
< t:2023-08-16 19:56:24,451 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > Snapshot Tag: sm_20230816195600UTC
< t:2023-08-16 19:56:24,451 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > Datacenters:
< t:2023-08-16 19:56:24,451 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > - us-east
< t:2023-08-16 19:56:24,451 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG >
< t:2023-08-16 19:56:24,451 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > ╭─────────────┬──────────┬──────────┬──────────┬──────────────┬────────╮
< t:2023-08-16 19:56:24,451 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > │ Host │ Progress │ Size │ Success │ Deduplicated │ Failed │
< t:2023-08-16 19:56:24,452 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > ├─────────────┼──────────┼──────────┼──────────┼──────────────┼────────┤
< t:2023-08-16 19:56:24,452 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > │ 10.12.0.108 │ 100% │ 283.379M │ 283.379M │ 0 │ 0 │
< t:2023-08-16 19:56:24,452 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > │ 10.12.1.151 │ 100% │ 283.379M │ 283.379M │ 0 │ 0 │
< t:2023-08-16 19:56:24,452 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > │ 10.12.1.249 │ 100% │ 283.377M │ 283.377M │ 0 │ 0 │
< t:2023-08-16 19:56:24,452 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > ╰─────────────┴──────────┴──────────┴──────────┴──────────────┴────────╯
Afterwards, we upgrade to 3.2.0:
< t:2023-08-16 20:01:12,692 f:remote_base.py l:520 c:RemoteLibSSH2CmdRunner p:DEBUG > Running command "sudo sctool version"...
< t:2023-08-16 20:01:12,770 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > Client version: 3.2.0-0.20230815.9db309c5
< t:2023-08-16 20:01:12,772 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > Server version: 3.2.0-0.20230815.9db309c5
And attempt to restore the aforementioned backup:
< t:2023-08-16 20:05:57,707 f:base.py l:142 c:RemoteLibSSH2CmdRunner p:DEBUG > Command "sudo sctool restore -c 4105638a-e5a7-4c68-b8c2-614904a59131 --restore-tables --location s3:manager-backup-tests-us-east-1 --snapshot-tag sm_20230816195600UTC" finished with status 0
< t:2023-08-16 20:05:57,708 f:cli.py l:1126 c:sdcm.mgmt.cli p:DEBUG > sctool output: restore/03db246a-0533-4a91-9079-9ecdf41884f2
But the restore fails, stating that there are missing files in the bucket:
< t:2023-08-16 20:06:35,842 f:base.py l:142 c:RemoteLibSSH2CmdRunner p:DEBUG > Command "sudo sctool -c 4105638a-e5a7-4c68-b8c2-614904a59131 progress restore/03db246a-0533-4a91-9079-9ecdf41884f2" finished with status 0
< t:2023-08-16 20:06:35,843 f:cli.py l:1126 c:sdcm.mgmt.cli p:DEBUG > sctool output: Restore progress
< t:2023-08-16 20:06:35,843 f:cli.py l:1126 c:sdcm.mgmt.cli p:DEBUG > Run: 500d5a78-3c70-11ee-8668-02b041669e61
< t:2023-08-16 20:06:35,843 f:cli.py l:1126 c:sdcm.mgmt.cli p:DEBUG > Status: ERROR (restoring backed-up data)
< t:2023-08-16 20:06:35,843 f:cli.py l:1126 c:sdcm.mgmt.cli p:DEBUG > Cause: not restored bundles [3 2]: wait on rclone job, id: 1692216016, host
< t:2023-08-16 20:06:35,843 f:cli.py l:1126 c:sdcm.mgmt.cli p:DEBUG > 10.12.0.108: job error (1692216016): failed to open source object: object not found; backup/sst/cluster/0fa19c7a-d21a-4bff-9f19-80c33dd2ae15/dc/us-east/node/1719707d-6da3-4cc6-b2b8-a8cef613b028/keyspace/keyspace2/table/standard1/c34e94003c6e11eeaf7f8fd4eda90fdb/me-2-big-Data.db failed to open source object: object not found; backup/sst/cluster/0fa19c7a-d21a-4bff-9f19-80c33dd2ae15/dc/us-east/node/1719707d-6da3-4cc6-b2b8-a8cef613b028/keyspace/keyspace2/table/standard1/c34e94003c6e11eeaf7f8fd4eda90fdb/me-2-big-Statistics.db failed to open source object: object not found; backup/sst/cluster/0fa19c7a-d21a-4bff-9f19-80c33dd2ae15/dc/us-east/node/1719707d-6da3-4cc6-b2b8-a8cef613b028/keyspace/keyspace2/table/standard1/c34e94003c6e11eeaf7f8fd4eda90fdb/me-2-big-Summary.db failed to open source object: object not found; backup/sst/cluster/0fa19c7a-d21a-4bff-9f19-80c33dd2ae15/dc/us-east/node/1719707d-6da3-4cc6-b2b8-a8cef613b028/keyspace/keyspace2/table/standard1/c34e94003c6e11eeaf7f8fd4eda90fdb/me-2-big-TOC.txt failed to open source object: object not found; backup/sst/cluster/0fa19c7a-d21a-4bff-9f19-80c33dd2ae15/dc/us-east/node/1719707d-6da3-4cc6-b2b8-a8cef613b028/keyspace/keyspace2/table/standard1/c34e94003c6e11eeaf7f8fd4eda90fdb/me-3-big-CRC.db failed to open source object: object not found; backup/sst/cluster/0fa19c7a-d21a-4bff-9f19-80c33dd2ae15/dc/us-east/node/1719707d-6da3-4cc6-b2b8-a8cef613b028/keyspace/keyspace2/table/standard1/c34e94003c6e11eeaf7f8fd4eda90fdb/me-3-big-Data.db failed to open source object: object not found; backup/sst/cluster/0fa19c7a-d21a-4bff-9f19-80c33dd2ae15/dc/us-east/node/1719707d-6da3-4cc6-b2b8-a8cef613b028/keyspace/keyspace2/table/standard1/c34e94003c6e11eeaf7f8fd4eda90fdb/me-3-big-Digest.crc32 failed to open source object: object not found; backup/sst/cluster/0fa19c7a-d21a-4bff-9f19-80c33dd2ae15/dc/us-east/node/1719707d-6da3-4cc6-b2b8-a8cef613b028/keyspace/keyspace2/table/standard1/c34e94003c6e11eeaf7f8fd4eda90fdb/me-3-big-Filter.db failed to open source object: object not found; backup/sst/cluster/0fa19c7a-d21a-4bff-9f19-80c33dd2ae15/dc/us-east/node/1719707d-6da3-4cc6-b2b8-a8cef613b028/keyspace/keyspace2/table/standard1/c34e94003c6e11eeaf7f8fd4eda90fdb/me-3-big-Index.db failed to open source object: object not found; backup/sst/cluster/0fa19c7a-d21a-4bff-9f19-80c33dd2ae15/dc/us-east/node/1719707d-6da3-4cc6-b2b8-a8cef613b028/keyspace/keyspace2/table/standard1/c34e94003c6e11eeaf7f8fd4eda90fdb/me-3-big-Scylla.db failed to open source object: object not found; backup/sst/cluster/0fa19c7a-d21a-4bff-9f19-80c33dd2ae15/dc/us-east/node/1719707d-6da3-4cc6-b2b8-a8cef613b028/keyspace/keyspace2/table/standard1/c34e94003c6e11eeaf7f8fd4eda90fdb/me-3-big-Statistics.db failed to open source object: object not found; backup/sst/cluster/0fa19c7a-d21a-4bff-9f19-80c33dd2ae15/dc/us-east/node/1719707d-6da3-4cc6-b2b8-a8cef613b028/keyspace/keyspace2/table/standard1/c34e94003c6e11eeaf7f8fd4eda90fdb/me-3-big-Summary.db failed to open source object: object not found; backup/sst/cluster/0fa19c7a-d21a-4bff-9f19-80c33dd2ae15/dc/us-east/node/1719707d-6da3-4cc6-b2b8-a8cef613b028/keyspace/keyspace2/table/standard1/c34e94003c6e11eeaf7f8fd4eda90fdb/me-3-big-TOC.txt failed to open source object: object not found
< t:2023-08-16 20:06:35,843 f:cli.py l:1126 c:sdcm.mgmt.cli p:DEBUG > Start time: 16 Aug 23 20:05:57 UTC
< t:2023-08-16 20:06:35,843 f:cli.py l:1126 c:sdcm.mgmt.cli p:DEBUG > End time: 16 Aug 23 20:06:05 UTC
< t:2023-08-16 20:06:35,843 f:cli.py l:1126 c:sdcm.mgmt.cli p:DEBUG > Duration: 8s
< t:2023-08-16 20:06:35,843 f:cli.py l:1126 c:sdcm.mgmt.cli p:DEBUG > Progress: 0% | 0%
< t:2023-08-16 20:06:35,843 f:cli.py l:1126 c:sdcm.mgmt.cli p:DEBUG > Snapshot Tag: sm_20230816195600UTC
< t:2023-08-16 20:06:35,843 f:cli.py l:1126 c:sdcm.mgmt.cli p:DEBUG >
< t:2023-08-16 20:06:35,843 f:cli.py l:1126 c:sdcm.mgmt.cli p:DEBUG > ╭───────────┬──────────┬──────────┬─────────┬────────────┬────────╮
< t:2023-08-16 20:06:35,843 f:cli.py l:1126 c:sdcm.mgmt.cli p:DEBUG > │ Keyspace │ Progress │ Size │ Success │ Downloaded │ Failed │
< t:2023-08-16 20:06:35,843 f:cli.py l:1126 c:sdcm.mgmt.cli p:DEBUG > ├───────────┼──────────┼──────────┼─────────┼────────────┼────────┤
< t:2023-08-16 20:06:35,843 f:cli.py l:1126 c:sdcm.mgmt.cli p:DEBUG > │ keyspace2 │ 0% | 0% │ 849.762M │ 0 │ 0 │ 0 │
< t:2023-08-16 20:06:35,843 f:cli.py l:1126 c:sdcm.mgmt.cli p:DEBUG > │ keyspace1 │ 0% | 0% │ 849.861M │ 0 │ 0 │ 0 │
< t:2023-08-16 20:06:35,843 f:cli.py l:1126 c:sdcm.mgmt.cli p:DEBUG > ╰───────────┴──────────┴──────────┴─────────┴────────────┴────────╯
To summerize, the files that the restore task reports as missing are:
backup/sst/cluster/0fa19c7a-d21a-4bff-9f19-80c33dd2ae15/dc/us-east/node/1719707d-6da3-4cc6-b2b8-a8cef613b028/keyspace/keyspace2/table/standard1/c34e94003c6e11eeaf7f8fd4eda90fdb/me-2-big-Data.db
backup/sst/cluster/0fa19c7a-d21a-4bff-9f19-80c33dd2ae15/dc/us-east/node/1719707d-6da3-4cc6-b2b8-a8cef613b028/keyspace/keyspace2/table/standard1/c34e94003c6e11eeaf7f8fd4eda90fdb/me-2-big-Statistics.db
backup/sst/cluster/0fa19c7a-d21a-4bff-9f19-80c33dd2ae15/dc/us-east/node/1719707d-6da3-4cc6-b2b8-a8cef613b028/keyspace/keyspace2/table/standard1/c34e94003c6e11eeaf7f8fd4eda90fdb/me-2-big-Summary.db
backup/sst/cluster/0fa19c7a-d21a-4bff-9f19-80c33dd2ae15/dc/us-east/node/1719707d-6da3-4cc6-b2b8-a8cef613b028/keyspace/keyspace2/table/standard1/c34e94003c6e11eeaf7f8fd4eda90fdb/me-2-big-TOC.txt
backup/sst/cluster/0fa19c7a-d21a-4bff-9f19-80c33dd2ae15/dc/us-east/node/1719707d-6da3-4cc6-b2b8-a8cef613b028/keyspace/keyspace2/table/standard1/c34e94003c6e11eeaf7f8fd4eda90fdb/me-3-big-CRC.db
backup/sst/cluster/0fa19c7a-d21a-4bff-9f19-80c33dd2ae15/dc/us-east/node/1719707d-6da3-4cc6-b2b8-a8cef613b028/keyspace/keyspace2/table/standard1/c34e94003c6e11eeaf7f8fd4eda90fdb/me-3-big-Data.db
backup/sst/cluster/0fa19c7a-d21a-4bff-9f19-80c33dd2ae15/dc/us-east/node/1719707d-6da3-4cc6-b2b8-a8cef613b028/keyspace/keyspace2/table/standard1/c34e94003c6e11eeaf7f8fd4eda90fdb/me-3-big-Digest.crc32
backup/sst/cluster/0fa19c7a-d21a-4bff-9f19-80c33dd2ae15/dc/us-east/node/1719707d-6da3-4cc6-b2b8-a8cef613b028/keyspace/keyspace2/table/standard1/c34e94003c6e11eeaf7f8fd4eda90fdb/me-3-big-Filter.db
backup/sst/cluster/0fa19c7a-d21a-4bff-9f19-80c33dd2ae15/dc/us-east/node/1719707d-6da3-4cc6-b2b8-a8cef613b028/keyspace/keyspace2/table/standard1/c34e94003c6e11eeaf7f8fd4eda90fdb/me-3-big-Index.db
backup/sst/cluster/0fa19c7a-d21a-4bff-9f19-80c33dd2ae15/dc/us-east/node/1719707d-6da3-4cc6-b2b8-a8cef613b028/keyspace/keyspace2/table/standard1/c34e94003c6e11eeaf7f8fd4eda90fdb/me-3-big-Scylla.db
backup/sst/cluster/0fa19c7a-d21a-4bff-9f19-80c33dd2ae15/dc/us-east/node/1719707d-6da3-4cc6-b2b8-a8cef613b028/keyspace/keyspace2/table/standard1/c34e94003c6e11eeaf7f8fd4eda90fdb/me-3-big-Statistics.db
backup/sst/cluster/0fa19c7a-d21a-4bff-9f19-80c33dd2ae15/dc/us-east/node/1719707d-6da3-4cc6-b2b8-a8cef613b028/keyspace/keyspace2/table/standard1/c34e94003c6e11eeaf7f8fd4eda90fdb/me-3-big-Summary.db
backup/sst/cluster/0fa19c7a-d21a-4bff-9f19-80c33dd2ae15/dc/us-east/node/1719707d-6da3-4cc6-b2b8-a8cef613b028/keyspace/keyspace2/table/standard1/c34e94003c6e11eeaf7f8fd4eda90fdb/me-3-big-TOC.txt
Which do seem to be missing from the actual bucket:
https://s3.console.aws.amazon.com/s3/buckets/manager-backup-tests-us-east-1?region=us-east-1&prefix=backup/sst/cluster/0fa19c7a-d21a-4bff-9f19-80c33dd2ae15/dc/us-east/node/1719707d-6da3-4cc6-b2b8-a8cef613b028/keyspace/keyspace2/table/standard1/c34e94003c6e11eeaf7f8fd4eda90fdb/&showversions=false
Impact
Restoring older backups would not be possilbe
Installation details
Kernel Version: 5.15.0-1034-aws
Scylla version (or git commit hash): 5.0.13-20230423.a0ca8abe4
with build-id 1196f46d1be44927c95ac67703df3ab2ef98a332
Initial manager version: 3.1.2-0.20230704.bd349aa4 Manager version after upgrade: 3.2.0-0.20230815.9db309c5
Cluster size: 3 nodes (i4i.large)
Scylla Nodes used in this run:
- manager-upgrade-manager--db-node-c60a1216-3 (44.197.244.212 | 10.12.1.151) (shards: 2)
- manager-upgrade-manager--db-node-c60a1216-2 (44.203.9.9 | 10.12.0.108) (shards: 2)
- manager-upgrade-manager--db-node-c60a1216-1 (44.195.38.80 | 10.12.1.249) (shards: 2)
OS / Image: ami-0659419f07e694eaf
(aws: undefined_region)
Test: ubuntu20-upgrade-test
Test id: c60a1216-9e68-4134-a42a-a34ce6a71a45
Test name: manager-3.2/ubuntu20-upgrade-test
Test config file(s):
Logs and commands
- Restore Monitor Stack command:
$ hydra investigate show-monitor c60a1216-9e68-4134-a42a-a34ce6a71a45
- Restore monitor on AWS instance using Jenkins job
- Show all stored logs command:
$ hydra investigate show-logs c60a1216-9e68-4134-a42a-a34ce6a71a45
Logs:
- db-cluster-c60a1216.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/c60a1216-9e68-4134-a42a-a34ce6a71a45/20230816_200800/db-cluster-c60a1216.tar.gz
- sct-runner-events-c60a1216.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/c60a1216-9e68-4134-a42a-a34ce6a71a45/20230816_200800/sct-runner-events-c60a1216.tar.gz
- sct-c60a1216.log.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/c60a1216-9e68-4134-a42a-a34ce6a71a45/20230816_200800/sct-c60a1216.log.tar.gz
- loader-set-c60a1216.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/c60a1216-9e68-4134-a42a-a34ce6a71a45/20230816_200800/loader-set-c60a1216.tar.gz
- monitor-set-c60a1216.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/c60a1216-9e68-4134-a42a-a34ce6a71a45/20230816_200800/monitor-set-c60a1216.tar.gz
- Scylla 5.0.13 is not supported anymore.
- So if the files are not in the bucket, where are they? (who deleted them, or did not back them up properly in the 1st place) ?
< t:2023-08-16 19:54:15,367 f:base.py l:142 c:RemoteLibSSH2CmdRunner p:DEBUG > Command "sudo sctool backup -c 4105638a-e5a7-4c68-b8c2-614904a59131 --keyspace keyspace1 --location s3:manager-backup-tests-us-east-1 --cron '56 19 * * *' " finished with status 0
Well, only keyspace1
is being backed-up, so that's not surprising that files from keyspace2
are missing.
I'm investigating why would SM try to restore keyspace2
if it wasn't the part of the backup.
What is interesting here, is that the backup bucket contains two different clusters 4105638a-e5a7-4c68-b8c2-614904a59131
, 0fa19c7a-d21a-4bff-9f19-80c33dd2ae15
and both of them have a backup with the same snapshot tag sm_20230816195600UTC
. This is possible only if ~~there are 2 instances of SM running backup~~ two different clusters are being backed-up into the same remote location at the very same time (up to 1 second backup start time diff).
In this test scenario, provided SM logs shows that SM backed up cluster 4105638a-e5a7-4c68-b8c2-614904a59131
with snapshot tag sm_20230816195600UTC
, but the following restore tried to restore backup from cluster 0fa19c7a-d21a-4bff-9f19-80c33dd2ae15
with the same snapshot tag.
But even though, I would expect that restore would end successfully with the result of restoring data from this second cluster, so it's still puzzling why files from this second cluster are missing.
@ShlomiBalalis could you verify, if what I'm saying make sense?
Moreover, please describe what tests were conducted on this second cluster (0fa19c7a-d21a-4bff-9f19-80c33dd2ae15
), because perhaps those tests scenarios are actually responsible for deleting those files.
@ShlomiBalalis ping
@ShlomiBalalis
@ShlomiBalalis could you please reply to the comment https://github.com/scylladb/scylla-manager/issues/3548#issuecomment-1699124738 ? We need to understand if there is really a bug in manager and if we should include it into 3.2.2.
What is interesting here, is that the backup bucket contains two different clusters
4105638a-e5a7-4c68-b8c2-614904a59131
,0fa19c7a-d21a-4bff-9f19-80c33dd2ae15
and both of them have a backup with the same snapshot tagsm_20230816195600UTC
. This is possible only if ~there are 2 instances of SM running backup~ two different clusters are being backed-up into the same remote location at the very same time (up to 1 second backup start time diff).In this test scenario, provided SM logs shows that SM backed up cluster
4105638a-e5a7-4c68-b8c2-614904a59131
with snapshot tagsm_20230816195600UTC
, but the following restore tried to restore backup from cluster0fa19c7a-d21a-4bff-9f19-80c33dd2ae15
with the same snapshot tag.But even though, I would expect that restore would end successfully with the result of restoring data from this second cluster, so it's still puzzling why files from this second cluster are missing.
@ShlomiBalalis could you verify, if what I'm saying make sense? Moreover, please describe what tests were conducted on this second cluster (
0fa19c7a-d21a-4bff-9f19-80c33dd2ae15
), because perhaps those tests scenarios are actually responsible for deleting those files.
I have no idea where did the second cluster come from. We only executed one cluster add
in this test, which gave us 4105638a-e5a7-4c68-b8c2-614904a59131
:
< t:2023-08-16 19:50:38,371 f:base.py l:142 c:RemoteLibSSH2CmdRunner p:DEBUG > Command "sudo sctool cluster add --host=10.12.1.249 --name=cluster_under_test --auth-token c60a1216-9e68-4134-a42a-a34ce6a71a45" finished with status 0
< t:2023-08-16 19:50:38,371 f:cli.py l:1126 c:sdcm.mgmt.cli p:DEBUG > sctool output: 4105638a-e5a7-4c68-b8c2-614904a59131
I see that the second cluster is only mentioned with regard to the restore/03db246a-0533-4a91-9079-9ecdf41884f2
task, but it was created using the 4105638a-e5a7-4c68-b8c2-614904a59131
cluster:
< t:2023-08-16 20:05:57,707 f:base.py l:142 c:RemoteLibSSH2CmdRunner p:DEBUG > Command "sudo sctool restore -c 4105638a-e5a7-4c68-b8c2-614904a59131 --restore-tables --location s3:manager-backup-tests-us-east-1 --snapshot-tag sm_20230816195600UTC" finished with status 0
< t:2023-08-16 20:05:57,708 f:cli.py l:1126 c:sdcm.mgmt.cli p:DEBUG > sctool output: restore/03db246a-0533-4a91-9079-9ecdf41884f2
The snapshot tag that the restore task uses, sm_20230816195600UTC
, was created by the backup/6d118b0f-2652-4258-a797-9a191b0b26f2
task, which was also created using 4105638a-e5a7-4c68-b8c2-614904a59131
:
< t:2023-08-16 19:54:15,367 f:base.py l:142 c:RemoteLibSSH2CmdRunner p:DEBUG > Command "sudo sctool backup -c 4105638a-e5a7-4c68-b8c2-614904a59131 --keyspace keyspace1 --location s3:manager-backup-tests-us-east-1 --cron '56 19 * * *' " finished with status 0
< t:2023-08-16 19:54:15,367 f:cli.py l:1126 c:sdcm.mgmt.cli p:DEBUG > sctool output: backup/6d118b0f-2652-4258-a797-9a191b0b26f2
...
< t:2023-08-16 19:56:24,300 f:remote_base.py l:520 c:RemoteLibSSH2CmdRunner p:DEBUG > Running command "sudo sctool -c 4105638a-e5a7-4c68-b8c2-614904a59131 progress backup/6d118b0f-2652-4258-a797-9a191b0b26f2"...
< t:2023-08-16 19:56:24,451 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > Run: ec1624fd-3c6e-11ee-bd30-02b041669e61
< t:2023-08-16 19:56:24,451 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > Status: DONE
< t:2023-08-16 19:56:24,451 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > Start time: 16 Aug 23 19:56:00 UTC
< t:2023-08-16 19:56:24,451 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > End time: 16 Aug 23 19:56:12 UTC
< t:2023-08-16 19:56:24,451 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > Duration: 12s
< t:2023-08-16 19:56:24,451 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > Progress: 100%
< t:2023-08-16 19:56:24,451 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > Snapshot Tag: sm_20230816195600UTC
< t:2023-08-16 19:56:24,451 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > Datacenters:
< t:2023-08-16 19:56:24,451 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > - us-east
< t:2023-08-16 19:56:24,451 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG >
< t:2023-08-16 19:56:24,451 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > ╭─────────────┬──────────┬──────────┬──────────┬──────────────┬────────╮
< t:2023-08-16 19:56:24,451 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > │ Host │ Progress │ Size │ Success │ Deduplicated │ Failed │
< t:2023-08-16 19:56:24,452 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > ├─────────────┼──────────┼──────────┼──────────┼──────────────┼────────┤
< t:2023-08-16 19:56:24,452 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > │ 10.12.0.108 │ 100% │ 283.379M │ 283.379M │ 0 │ 0 │
< t:2023-08-16 19:56:24,452 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > │ 10.12.1.151 │ 100% │ 283.379M │ 283.379M │ 0 │ 0 │
< t:2023-08-16 19:56:24,452 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > │ 10.12.1.249 │ 100% │ 283.377M │ 283.377M │ 0 │ 0 │
< t:2023-08-16 19:56:24,452 f:base.py l:228 c:RemoteLibSSH2CmdRunner p:DEBUG > ╰─────────────┴──────────┴──────────┴──────────┴──────────────┴────────╯
So I really don't know how did 0fa19c7a-d21a-4bff-9f19-80c33dd2ae15
came into play.
So from my perspective, this looks like there were concurrent test execution or something similar. @ShlomiBalalis was this issue ever reproduced? If not, could we rerun it and see if it still happens?
@ShlomiBalalis ping
@mikliapko is this something that you could take care of? I mean investigating the reasons for two clusters being there.