cassandra-medusa icon indicating copy to clipboard operation
cassandra-medusa copied to clipboard

Backup is failing with error " az cli cp failed. Max attempts exceeded"

Open renoypaulose opened this issue 2 years ago • 0 comments

Medusa version: 13.1

Backup is failing in one of the nodes(of 3 node cluster) with the below error.

[2022-06-15 04:00:50,762] ERROR: Error occurred during backup: az cli cp failed. Max attempts exceeded. Check /tmp/azcli_2e3a208f-4631-49d0-b65e-d59e559c699e.output for more informations.
Traceback (most recent call last):
  File "/home/cassandra/medusa/backup_node.py", line 369, in backup_snapshots
    manifest_objects += storage.storage_driver.upload_blobs(src_batch, dst_path)
  File "/home/cassandra/medusa/storage/azure_storage.py", line 70, in upload_blobs
    multi_part_upload_threshold=int(self.config.multi_part_upload_threshold)
  File "/home/cassandra/medusa/storage/azure_blobs_storage/concurrent.py", line 87, in upload_blobs
    return job.execute(list(src))
  File "/home/cassandra/medusa/storage/azure_blobs_storage/concurrent.py", line 51, in execute
    return list(executor.map(self.with_storage, iterables))
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 586, in result_iterator
    yield fs.pop().result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/cassandra/medusa/storage/azure_blobs_storage/concurrent.py", line 60, in with_storage
    return self.func(self.storage, connection, iterable)
  File "/home/cassandra/medusa/storage/azure_blobs_storage/concurrent.py", line 83, in <lambda>
    storage, connection, src_file, dest, bucket, multi_part_upload_threshold
  File "/home/cassandra/medusa/storage/azure_blobs_storage/concurrent.py", line 116, in __upload_file
    obj = _upload_multi_part(storage, connection, src, bucket, full_object_name)
  File "/home/cassandra/medusa/storage/azure_blobs_storage/concurrent.py", line 137, in _upload_multi_part
    objects = azcli.cp_upload(srcs=[src], bucket_name=bucket.name, dest=object_name)
  File "/home/cassandra/medusa/storage/azure_blobs_storage/azcli.py", line 58, in cp_upload
    objects.append(self.upload_file(cmd, dest, azcli_output))
  File "/home/cassandra/.local/lib/python3.6/site-packages/retrying.py", line 49, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/home/cassandra/.local/lib/python3.6/site-packages/retrying.py", line 212, in call
    raise attempt.get()
  File "/home/cassandra/.local/lib/python3.6/site-packages/retrying.py", line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/home/cassandra/.local/lib/python3.6/site-packages/six.py", line 719, in reraise
    raise value
  File "/home/cassandra/.local/lib/python3.6/site-packages/retrying.py", line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/home/cassandra/medusa/storage/azure_blobs_storage/azcli.py", line 91, in upload_file
    azcli_output
OSError: az cli cp failed. Max attempts exceeded. Check /tmp/azcli_2e3a208f-4631-49d0-b65e-d59e559c699e.output for more informations.
:~$ cat /tmp/azcli_2e3a208f-4631-49d0-b65e-d59e559c699e.output
Alive[#############################################################   ]  96.7372%ERROR: The specified blob already exists.
RequestId:31108bdf-101e-004d-38a8-666f85000000
Time:2022-06-15T04:00:50.1915776Z
ErrorCode:BlobAlreadyExists
If you want to overwrite the existing one, please add --overwrite in your command.

The file that medusa trying to take backup is already present in the AZ storage account.

I think the issue will be solved if we add --overwrite in medusa/storage/azure_blobs_storage/azcli.py, under cp_upload function

            cmd = self._az_cli_cmd + ["storage", "blob", "upload", "--overwrite", "-f", str(src), "-c", bucket_name, "-n", dest,
                                      "--content-md5", AbstractStorage.generate_md5_hash(src)]

┆Issue is synchronized with this Jira Task by Unito ┆friendlyId: K8SSAND-1580 ┆priority: Medium

renoypaulose avatar Jun 16 '22 10:06 renoypaulose