dvc
dvc copied to clipboard
dvc push: fails for all commits if Azure remote url/account changed
Bug Report
Description
I have a repository that was configured with an Azure remote named dvcstore and pointing to a container url azure://container-a on a storage account account1 and all tracked files are pushed there. I wanted to use a different service account account2 and container name container-b going forward and re-push everything in history to the new location (expecting to later be able to delete the old account). I changed the remove config to a new url. At that point I thought it would just be a matter if running dvc push -A -R and while it does push all workspace files, it fails to push the historical tracked files from earlier commits, giving messages like ERROR: failed to transfer 'b4c7f7bc7bfafae5ecf58323f107674d' - Container does not exist. It seems to be trying to use the config that might be stored with those earlier commits (which has a different URL pointing to a container that, of course, doesn't exist in the new account), i.e. it's trying to push to azure://container-a not the new url azure://container-b, despite the new config.
Reproduce
az storage container create --name container-a --account-name <account_1_name> --account-key <account1_key>
az storage container create --name container-b --account-name <account_2_name> --account-key <account2_key>
mkdir dvc-push-test && cd dvc-push-test
git init
dvc init
git commit -m "Initialize DVC"
dvc remote add -d dvcstore azure://container-a
git add -A
git commit -m "Add DVC remote"
head -c 100 /dev/urandom > dummy
dvc add dummy
git add dummy.dvc .gitignore
git commit -m "Add dummy file"
export AZURE_STORAGE_ACCOUNT=<account_1_name>
export AZURE_STORAGE_KEY=<account1_key>
dvc push
head -c 100 /dev/urandom > dummy
dvc add dummy
git add dummy.dvc
git commit -m "Update dummy file"
dvc push
dvc remote modify dvcstore url azure://container-b
git add -A
git commit -m "Update DVC remote"
export AZURE_STORAGE_ACCOUNT=<account_1_name>
export AZURE_STORAGE_KEY=<account2_key>
dvc push -A -R
Output of dvc push -A -R:
ERROR: failed to transfer 'b4c7f7bc7bfafae5ecf58323f107674d' - Container does not exist.
ERROR: failed to transfer '38fe24bfb3b076c91a42e450e8a84b01' - Container does not exist.
ERROR: failed to push data to the cloud - 2 files failed to upload
Output of dvc push -A -R --verbose:
2023-12-08 12:46:14,788 DEBUG: v3.33.3 (brew), CPython 3.11.6 on macOS-13.5-arm64-arm-64bit
2023-12-08 12:46:14,788 DEBUG: command: /opt/homebrew/bin/dvc push -A -R --verbose
2023-12-08 12:46:14,972 DEBUG: Preparing to transfer data from '/Users/neverfox/Repos/dvc-push-test/.dvc/cache/files/md5' to 'azure://container-b/files/md5'
2023-12-08 12:46:14,972 DEBUG: Preparing to collect status from 'container-b/files/md5'
2023-12-08 12:46:14,972 DEBUG: Collecting status from 'container-b/files/md5'
2023-12-08 12:46:14,973 DEBUG: Querying 1 oids via object_exists
2023-12-08 12:46:15,434 DEBUG: Preparing to transfer data from '/Users/neverfox/Repos/dvc-push-test/.dvc/cache/files/md5' to 'azure://container-a/files/md5'
2023-12-08 12:46:15,434 DEBUG: Preparing to collect status from 'container-a/files/md5'
2023-12-08 12:46:15,434 DEBUG: Collecting status from 'container-a/files/md5'
2023-12-08 12:46:15,864 DEBUG: Estimated remote size: 4096 files
2023-12-08 12:46:15,865 DEBUG: Large remote ('2' oids < '4.096' traverse weight), using object_exists for remaining oids
2023-12-08 12:46:15,865 DEBUG: Querying 2 oids via object_exists
2023-12-08 12:46:16,165 DEBUG: Preparing to collect status from '/Users/neverfox/Repos/dvc-push-test/.dvc/cache/files/md5'
2023-12-08 12:46:16,165 DEBUG: Collecting status from '/Users/neverfox/Repos/dvc-push-test/.dvc/cache/files/md5'
2023-12-08 12:46:16,358 ERROR: failed to transfer '38fe24bfb3b076c91a42e450e8a84b01' - Container does not exist.
Traceback (most recent call last):
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/adlfs/spec.py", line 1574, in _put_file
await bc.upload_blob(
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/core/tracing/decorator_async.py", line 77, in wrapper_use_tracer
return await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/storage/blob/aio/_blob_client_async.py", line 419, in upload_blob
return await upload_block_blob(**options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/storage/blob/aio/_upload_helpers.py", line 172, in upload_block_blob
process_storage_error(error)
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/storage/blob/_shared/response_handlers.py", line 184, in process_storage_error
exec("raise error from None") # pylint: disable=exec-used # nosec
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<string>", line 1, in <module>
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/storage/blob/aio/_upload_helpers.py", line 82, in upload_block_blob
response = await client.upload(
^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/core/tracing/decorator_async.py", line 77, in wrapper_use_tracer
return await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/storage/blob/_generated/aio/operations/_block_blob_operations.py", line 256, in upload
map_error(status_code=response.status_code, response=response, error_map=error_map)
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/core/exceptions.py", line 165, in map_error
raise error
azure.core.exceptions.ResourceNotFoundError: The specified container does not exist.
RequestId:665e3800-901e-004f-71fe-2904b5000000
Time:2023-12-08T17:46:16.3277730Z
ErrorCode:ContainerNotFound
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>ContainerNotFound</Code><Message>The specified container does not exist.
RequestId:665e3800-901e-004f-71fe-2904b5000000
Time:2023-12-08T17:46:16.3277730Z</Message></Error>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc_objects/fs/generic.py", line 334, in transfer
_try_links(
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc_objects/fs/generic.py", line 266, in _try_links
return copy(
^^^^^
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc_objects/fs/generic.py", line 87, in copy
return _put(
^^^^^
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc_objects/fs/generic.py", line 156, in _put
return _put_one(from_paths[0], to_paths[0])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc_objects/fs/generic.py", line 148, in _put_one
return put_file(from_path, to_path, callback=callback, **put_file_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc_objects/fs/callbacks.py", line 82, in func
return wrapped(path1, path2, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc_objects/fs/callbacks.py", line 54, in wrapped
res = fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc_objects/fs/base.py", line 547, in put_file
self.fs.put_file(os.fspath(from_file), to_info, callback=callback, **kwargs)
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc_azure/spec.py", line 9, in put_file
return super().put_file(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/fsspec/asyn.py", line 118, in wrapper
return sync(self.loop, func, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/fsspec/asyn.py", line 103, in sync
raise return_result
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/fsspec/asyn.py", line 56, in _runner
result[0] = await coro
^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/adlfs/spec.py", line 1589, in _put_file
raise FileNotFoundError("Container does not exist.")
FileNotFoundError: Container does not exist.
2023-12-08 12:46:16,493 ERROR: failed to transfer 'b4c7f7bc7bfafae5ecf58323f107674d' - Container does not exist.
Traceback (most recent call last):
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/adlfs/spec.py", line 1574, in _put_file
await bc.upload_blob(
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/core/tracing/decorator_async.py", line 77, in wrapper_use_tracer
return await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/storage/blob/aio/_blob_client_async.py", line 419, in upload_blob
return await upload_block_blob(**options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/storage/blob/aio/_upload_helpers.py", line 172, in upload_block_blob
process_storage_error(error)
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/storage/blob/_shared/response_handlers.py", line 184, in process_storage_error
exec("raise error from None") # pylint: disable=exec-used # nosec
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<string>", line 1, in <module>
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/storage/blob/aio/_upload_helpers.py", line 82, in upload_block_blob
response = await client.upload(
^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/core/tracing/decorator_async.py", line 77, in wrapper_use_tracer
return await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/storage/blob/_generated/aio/operations/_block_blob_operations.py", line 256, in upload
map_error(status_code=response.status_code, response=response, error_map=error_map)
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/core/exceptions.py", line 165, in map_error
raise error
azure.core.exceptions.ResourceNotFoundError: The specified container does not exist.
RequestId:665e3864-901e-004f-4efe-2904b5000000
Time:2023-12-08T17:46:16.4586980Z
ErrorCode:ContainerNotFound
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>ContainerNotFound</Code><Message>The specified container does not exist.
RequestId:665e3864-901e-004f-4efe-2904b5000000
Time:2023-12-08T17:46:16.4586980Z</Message></Error>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc_objects/fs/generic.py", line 148, in _put_one
return put_file(from_path, to_path, callback=callback, **put_file_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc_objects/fs/callbacks.py", line 82, in func
return wrapped(path1, path2, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc_objects/fs/callbacks.py", line 54, in wrapped
res = fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc_objects/fs/base.py", line 547, in put_file
self.fs.put_file(os.fspath(from_file), to_info, callback=callback, **kwargs)
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc_azure/spec.py", line 9, in put_file
return super().put_file(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/fsspec/asyn.py", line 118, in wrapper
return sync(self.loop, func, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/fsspec/asyn.py", line 103, in sync
raise return_result
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/fsspec/asyn.py", line 56, in _runner
result[0] = await coro
^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/adlfs/spec.py", line 1589, in _put_file
raise FileNotFoundError("Container does not exist.")
FileNotFoundError: Container does not exist.
2023-12-08 12:46:16,788 ERROR: failed to push data to the cloud - 2 files failed to upload
Traceback (most recent call last):
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc/commands/data_sync.py", line 64, in run
processed_files_count = self.repo.push(
^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc/repo/__init__.py", line 60, in wrapper
return f(repo, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc/repo/push.py", line 144, in push
raise UploadError(failed_count)
dvc.exceptions.UploadError: 2 files failed to upload
2023-12-08 12:46:16,791 DEBUG: Analytics is enabled.
2023-12-08 12:46:16,862 DEBUG: Trying to spawn ['daemon', 'analytics', '/var/folders/1y/zq98_5fd2m31wwv7vvct5r0m0000gn/T/tmprv72ktcw', '-v']
2023-12-08 12:46:16,867 DEBUG: Spawned ['daemon', 'analytics', '/var/folders/1y/zq98_5fd2m31wwv7vvct5r0m0000gn/T/tmprv72ktcw', '-v'] with pid 62712
Checking container-b in account 2, does show that the workspace file '38fe24bfb3b076c91a42e450e8a84b01' was pushed but not 'b4c7f7bc7bfafae5ecf58323f107674d' (from first commit of dummy data).
Expected
All tracked data from all commits would successfully push to the new account and container as specified in the current environment and config.
Environment information
Output of dvc doctor:
DVC version: 3.33.3 (brew)
--------------------------
Platform: Python 3.11.6 on macOS-13.5-arm64-arm-64bit
Subprojects:
dvc_data = 2.22.6
dvc_objects = 1.4.9
dvc_render = 1.0.0
dvc_task = 0.3.0
scmrepo = 1.5.0
Supports:
azure (adlfs = 2023.10.0, knack = 0.11.0, azure-identity = 1.15.0),
gdrive (pydrive2 = 1.18.0),
gs (gcsfs = 2023.12.1),
http (aiohttp = 3.9.1, aiohttp-retry = 2.8.3),
https (aiohttp = 3.9.1, aiohttp-retry = 2.8.3),
oss (ossfs = 2023.12.0),
s3 (s3fs = 2023.12.1, boto3 = 1.33.1),
ssh (sshfs = 2023.10.0),
webdav (webdav4 = 0.9.8),
webdavs (webdav4 = 0.9.8),
webhdfs (fsspec = 2023.12.1)
Config:
Global: /Users/neverfox/Library/Application Support/dvc
System: /opt/homebrew/share/dvc
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: azure
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc, git
Repo.site_cache_dir: /opt/homebrew/var/cache/dvc/repo/326b7469459f67242b1221df6053081e
Additional Information (if any):
Hi @neverfox! This is expected behavior, since DVC will respect the config stored in each commit. The easiest way to migrate to a new remote would be to use azure to copy the whole remote cache to the new location. Would that work for you?
Closing due to lack of response.