cassandra-medusa
cassandra-medusa copied to clipboard
Failed to find uploaded object
We are running medusa on a single-rack cluster of 9 nodes. After running full cluster backups, medusa fails after a long time. We noticed that Medusa always fails on the same object (300GB). I can confirm the object does exist in the AWS-S3 bucket but not sure why Medusa says that failed to find the uploaded object. Could you please help ? This is a production cluster. We are using the 3.11.2 version of Cassandra.
[2022-11-18 22:18:30,998] INFO: Uploading /data/Cassandra/3.11.2/data/production_exndb/res1-ac8b1b00218411ec9c63e7061b9094bf/snapshots/medusa-2022-11-18_1722-full/mc-29511-big-Data.db (30.297GiB) [2022-11-18 22:18:31,091] INFO: Uploading /data/Cassandra/3.11.2/data/production_exndb/res1-ac8b1b00218411ec9c63e7061b9094bf/snapshots/medusa-2022-11-18_1722-full/mc-29511-big-Index.db (50.202MiB) [2022-11-18 22:19:27,206] INFO: Uploading /data/Cassandra/3.11.2/data/production_exndb/res1-ac8b1b00218411ec9c63e7061b9094bf/snapshots/medusa-2022-11-18_1722-full/mc-29511-big-Summary.db (4.068KiB) [2022-11-18 22:19:27,253] INFO: Uploading /data/Cassandra/3.11.2/data/production_exndb/res1-ac8b1b00218411ec9c63e7061b9094bf/snapshots/medusa-2022-11-18_1722-full/mc-29511-big-Digest.crc32 (9.000B) [2022-11-18 22:19:27,293] INFO: Uploading /data/Cassandra/3.11.2/data/production_exndb/res1-ac8b1b00218411ec9c63e7061b9094bf/snapshots/medusa-2022-11-18_1722-full/mc-29511-big-Filter.db (30.117KiB) [2022-11-18 22:19:27,386] INFO: Uploading /data/Cassandra/3.11.2/data/production_exndb/res1-ac8b1b00218411ec9c63e7061b9094bf/snapshots/medusa-2022-11-18_1722-full/mc-37070-big-Statistics.db (17.652KiB) [2022-11-18 22:19:27,457] INFO: Uploading /data/Cassandra/3.11.2/data/production_exndb/res1-ac8b1b00218411ec9c63e7061b9094bf/snapshots/medusa-2022-11-18_1722-full/mc-37070-big-TOC.txt (92.000B) [2022-11-18 22:19:27,532] INFO: Uploading /data/Cassandra/3.11.2/data/production_exndb/res1-ac8b1b00218411ec9c63e7061b9094bf/snapshots/medusa-2022-11-18_1722-full/mc-37070-big-CompressionInfo.db (4.201MiB) [2022-11-18 22:19:32,222] INFO: Uploading /data/Cassandra/3.11.2/data/production_exndb/res1-ac8b1b00218411ec9c63e7061b9094bf/snapshots/medusa-2022-11-18_1722-full/mc-37070-big-Data.db (6.882GiB) [2022-11-18 22:34:40,281] INFO: Uploading /data/Cassandra/3.11.2/data/production_exndb/res1-ac8b1b00218411ec9c63e7061b9094bf/snapshots/medusa-2022-11-18_1722-full/mc-37070-big-Index.db (10.072MiB) [2022-11-18 22:34:51,848] INFO: Uploading /data/Cassandra/3.11.2/data/production_exndb/res1-ac8b1b00218411ec9c63e7061b9094bf/snapshots/medusa-2022-11-18_1722-full/mc-37070-big-Summary.db (866.000B) [2022-11-18 22:34:51,897] INFO: Uploading /data/Cassandra/3.11.2/data/production_exndb/res1-ac8b1b00218411ec9c63e7061b9094bf/snapshots/medusa-2022-11-18_1722-full/mc-37070-big-Digest.crc32 (10.000B) [2022-11-18 22:34:51,952] INFO: Uploading /data/Cassandra/3.11.2/data/production_exndb/res1-ac8b1b00218411ec9c63e7061b9094bf/snapshots/medusa-2022-11-18_1722-full/mc-37070-big-Filter.db (6.039KiB) [2022-11-19 10:40:04,228] ERROR: Error occurred during backup: Failed to find uploaded object cluster-01/cass-01/2022-11-18_1722-full/data/production_exndb/res1-ac8b1b00218411ec9c63e7061b9094bf/mc-23596-big-Data.db in S3 [2022-11-19 10:40:05,815] ERROR: Issue occurred inside handle_backup Name: 2022-11-18_1722-full Error: Failed to find uploaded object cluster-01/cass-01/2022-11-18_1722-full/data/production_exndb/res1-ac8b1b00218411ec9c63e7061b9094bf/mc-23596-big-Data.db in S3 [2022-11-19 10:40:05,815] INFO: Updated from existing status: 0 to new status: 2 for backup id: 2022-11-18_1722-full [2022-11-19 10:40:05,815] ERROR: Error occurred during backup: Failed to find uploaded object cluster-01/cass-01/2022-11-18_1722-full/data/production_exndb/res1-ac8b1b00218411ec9c63e7061b9094bf/mc-23596-big-Data.db in S3
We tried to change transfer_max_bandwidth to different values (50, 100, 250, and 500MB/s), but that did not help. Also, increasing/decreasing the value of concurrent_transfers did help.
Same symptom here and same actions we took but it always failed for all nodes. We are using awscli 1.19.x version and Medusa 0.13.4 version.
-Sheng
Hi @adejanovski
Could you please help here?
My best guess here is that depending on factors that I'm really unsure of, when we exceed the maximum number of parts for a file (and awscli doesn't always seem to self tune the chunk size itself), the exceeding parts will get ignored... silently. That's why we're not able to read the file through the API after that, leading to the observed behavior.
I think we can work around this by adding a flag to awscli which specifies the size of the file (as silly as it sounds, I would assume it should be able to detect it 🤷 ), to enforce changing the size of the splits so that we don't go over the max threshold.
Our longer term goal is to stop relying on awscli (or gsutil or the azure cli) for multi part uploads by contributing to libcloud in order to make multi part uploads safer and more efficient.
Hi, we no longer rely on cli utils, so this might no longer be an issue.
@mohammad-aburadeh , in case you're still facing this, could you pleae give it a go with a recent medusa (eg 0.20.1) ?
Hi, we no longer rely on cli utils, so this might no longer be an issue.
@mohammad-aburadeh , in case you're still facing this, could you pleae give it a go with a recent medusa (eg 0.20.1) ?
Thanks @rzvoncek. I will upgrade medusa
I hope the new Medusa helped. I don't see any newly opened issues, so I suppose things are stable for now. Please don't hesitate to reach out in case you need more help.