dvc icon indicating copy to clipboard operation
dvc copied to clipboard

dvc push: fails for all commits if Azure remote url/account changed

Open neverfox opened this issue 1 year ago • 1 comments

Bug Report

Description

I have a repository that was configured with an Azure remote named dvcstore and pointing to a container url azure://container-a on a storage account account1 and all tracked files are pushed there. I wanted to use a different service account account2 and container name container-b going forward and re-push everything in history to the new location (expecting to later be able to delete the old account). I changed the remove config to a new url. At that point I thought it would just be a matter if running dvc push -A -R and while it does push all workspace files, it fails to push the historical tracked files from earlier commits, giving messages like ERROR: failed to transfer 'b4c7f7bc7bfafae5ecf58323f107674d' - Container does not exist. It seems to be trying to use the config that might be stored with those earlier commits (which has a different URL pointing to a container that, of course, doesn't exist in the new account), i.e. it's trying to push to azure://container-a not the new url azure://container-b, despite the new config.

Reproduce

az storage container create --name container-a --account-name <account_1_name> --account-key <account1_key>
az storage container create --name container-b --account-name <account_2_name> --account-key <account2_key>
mkdir dvc-push-test && cd dvc-push-test
git init
dvc init
git commit -m "Initialize DVC"
dvc remote add -d dvcstore azure://container-a
git add -A
git commit -m "Add DVC remote"
head -c 100 /dev/urandom > dummy
dvc add dummy
git add dummy.dvc .gitignore
git commit -m "Add dummy file"
export AZURE_STORAGE_ACCOUNT=<account_1_name>
export AZURE_STORAGE_KEY=<account1_key>
dvc push
head -c 100 /dev/urandom > dummy
dvc add dummy
git add dummy.dvc
git commit -m "Update dummy file"
dvc push
dvc remote modify dvcstore url azure://container-b
git add -A
git commit -m "Update DVC remote"
export AZURE_STORAGE_ACCOUNT=<account_1_name>
export AZURE_STORAGE_KEY=<account2_key>
dvc push -A -R

Output of dvc push -A -R:

ERROR: failed to transfer 'b4c7f7bc7bfafae5ecf58323f107674d' - Container does not exist.             
ERROR: failed to transfer '38fe24bfb3b076c91a42e450e8a84b01' - Container does not exist.             
ERROR: failed to push data to the cloud - 2 files failed to upload

Output of dvc push -A -R --verbose:

2023-12-08 12:46:14,788 DEBUG: v3.33.3 (brew), CPython 3.11.6 on macOS-13.5-arm64-arm-64bit
2023-12-08 12:46:14,788 DEBUG: command: /opt/homebrew/bin/dvc push -A -R --verbose
2023-12-08 12:46:14,972 DEBUG: Preparing to transfer data from '/Users/neverfox/Repos/dvc-push-test/.dvc/cache/files/md5' to 'azure://container-b/files/md5'
2023-12-08 12:46:14,972 DEBUG: Preparing to collect status from 'container-b/files/md5'              
2023-12-08 12:46:14,972 DEBUG: Collecting status from 'container-b/files/md5'                        
2023-12-08 12:46:14,973 DEBUG: Querying 1 oids via object_exists                                     
2023-12-08 12:46:15,434 DEBUG: Preparing to transfer data from '/Users/neverfox/Repos/dvc-push-test/.dvc/cache/files/md5' to 'azure://container-a/files/md5'                                              
2023-12-08 12:46:15,434 DEBUG: Preparing to collect status from 'container-a/files/md5'              
2023-12-08 12:46:15,434 DEBUG: Collecting status from 'container-a/files/md5'                        
2023-12-08 12:46:15,864 DEBUG: Estimated remote size: 4096 files                                     
2023-12-08 12:46:15,865 DEBUG: Large remote ('2' oids < '4.096' traverse weight), using object_exists for remaining oids                                                                                  
2023-12-08 12:46:15,865 DEBUG: Querying 2 oids via object_exists                                     
2023-12-08 12:46:16,165 DEBUG: Preparing to collect status from '/Users/neverfox/Repos/dvc-push-test/.dvc/cache/files/md5'                                                                                
2023-12-08 12:46:16,165 DEBUG: Collecting status from '/Users/neverfox/Repos/dvc-push-test/.dvc/cache/files/md5'
2023-12-08 12:46:16,358 ERROR: failed to transfer '38fe24bfb3b076c91a42e450e8a84b01' - Container does not exist.                                                                                          
Traceback (most recent call last):                                                                   
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/adlfs/spec.py", line 1574, in _put_file
    await bc.upload_blob(
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/core/tracing/decorator_async.py", line 77, in wrapper_use_tracer
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/storage/blob/aio/_blob_client_async.py", line 419, in upload_blob
    return await upload_block_blob(**options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/storage/blob/aio/_upload_helpers.py", line 172, in upload_block_blob
    process_storage_error(error)
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/storage/blob/_shared/response_handlers.py", line 184, in process_storage_error
    exec("raise error from None")   # pylint: disable=exec-used # nosec
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 1, in <module>
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/storage/blob/aio/_upload_helpers.py", line 82, in upload_block_blob
    response = await client.upload(
               ^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/core/tracing/decorator_async.py", line 77, in wrapper_use_tracer
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/storage/blob/_generated/aio/operations/_block_blob_operations.py", line 256, in upload
    map_error(status_code=response.status_code, response=response, error_map=error_map)
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/core/exceptions.py", line 165, in map_error
    raise error
azure.core.exceptions.ResourceNotFoundError: The specified container does not exist.
RequestId:665e3800-901e-004f-71fe-2904b5000000
Time:2023-12-08T17:46:16.3277730Z
ErrorCode:ContainerNotFound
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>ContainerNotFound</Code><Message>The specified container does not exist.
RequestId:665e3800-901e-004f-71fe-2904b5000000
Time:2023-12-08T17:46:16.3277730Z</Message></Error>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc_objects/fs/generic.py", line 334, in transfer
    _try_links(
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc_objects/fs/generic.py", line 266, in _try_links
    return copy(
           ^^^^^
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc_objects/fs/generic.py", line 87, in copy
    return _put(
           ^^^^^
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc_objects/fs/generic.py", line 156, in _put
    return _put_one(from_paths[0], to_paths[0])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc_objects/fs/generic.py", line 148, in _put_one
    return put_file(from_path, to_path, callback=callback, **put_file_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc_objects/fs/callbacks.py", line 82, in func
    return wrapped(path1, path2, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc_objects/fs/callbacks.py", line 54, in wrapped
    res = fn(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc_objects/fs/base.py", line 547, in put_file
    self.fs.put_file(os.fspath(from_file), to_info, callback=callback, **kwargs)
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc_azure/spec.py", line 9, in put_file
    return super().put_file(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
                ^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/adlfs/spec.py", line 1589, in _put_file
    raise FileNotFoundError("Container does not exist.")
FileNotFoundError: Container does not exist.

2023-12-08 12:46:16,493 ERROR: failed to transfer 'b4c7f7bc7bfafae5ecf58323f107674d' - Container does not exist.                                                                                          
Traceback (most recent call last):                                                                   
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/adlfs/spec.py", line 1574, in _put_file
    await bc.upload_blob(
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/core/tracing/decorator_async.py", line 77, in wrapper_use_tracer
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/storage/blob/aio/_blob_client_async.py", line 419, in upload_blob
    return await upload_block_blob(**options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/storage/blob/aio/_upload_helpers.py", line 172, in upload_block_blob
    process_storage_error(error)
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/storage/blob/_shared/response_handlers.py", line 184, in process_storage_error
    exec("raise error from None")   # pylint: disable=exec-used # nosec
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 1, in <module>
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/storage/blob/aio/_upload_helpers.py", line 82, in upload_block_blob
    response = await client.upload(
               ^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/core/tracing/decorator_async.py", line 77, in wrapper_use_tracer
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/storage/blob/_generated/aio/operations/_block_blob_operations.py", line 256, in upload
    map_error(status_code=response.status_code, response=response, error_map=error_map)
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/azure/core/exceptions.py", line 165, in map_error
    raise error
azure.core.exceptions.ResourceNotFoundError: The specified container does not exist.
RequestId:665e3864-901e-004f-4efe-2904b5000000
Time:2023-12-08T17:46:16.4586980Z
ErrorCode:ContainerNotFound
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>ContainerNotFound</Code><Message>The specified container does not exist.
RequestId:665e3864-901e-004f-4efe-2904b5000000
Time:2023-12-08T17:46:16.4586980Z</Message></Error>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc_objects/fs/generic.py", line 148, in _put_one
    return put_file(from_path, to_path, callback=callback, **put_file_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc_objects/fs/callbacks.py", line 82, in func
    return wrapped(path1, path2, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc_objects/fs/callbacks.py", line 54, in wrapped
    res = fn(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc_objects/fs/base.py", line 547, in put_file
    self.fs.put_file(os.fspath(from_file), to_info, callback=callback, **kwargs)
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc_azure/spec.py", line 9, in put_file
    return super().put_file(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
                ^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/adlfs/spec.py", line 1589, in _put_file
    raise FileNotFoundError("Container does not exist.")
FileNotFoundError: Container does not exist.

2023-12-08 12:46:16,788 ERROR: failed to push data to the cloud - 2 files failed to upload           
Traceback (most recent call last):                                                                   
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc/commands/data_sync.py", line 64, in run
    processed_files_count = self.repo.push(
                            ^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc/repo/__init__.py", line 60, in wrapper
    return f(repo, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/dvc/3.33.3_1/libexec/lib/python3.11/site-packages/dvc/repo/push.py", line 144, in push
    raise UploadError(failed_count)
dvc.exceptions.UploadError: 2 files failed to upload

2023-12-08 12:46:16,791 DEBUG: Analytics is enabled.
2023-12-08 12:46:16,862 DEBUG: Trying to spawn ['daemon', 'analytics', '/var/folders/1y/zq98_5fd2m31wwv7vvct5r0m0000gn/T/tmprv72ktcw', '-v']
2023-12-08 12:46:16,867 DEBUG: Spawned ['daemon', 'analytics', '/var/folders/1y/zq98_5fd2m31wwv7vvct5r0m0000gn/T/tmprv72ktcw', '-v'] with pid 62712

Checking container-b in account 2, does show that the workspace file '38fe24bfb3b076c91a42e450e8a84b01' was pushed but not 'b4c7f7bc7bfafae5ecf58323f107674d' (from first commit of dummy data).

Expected

All tracked data from all commits would successfully push to the new account and container as specified in the current environment and config.

Environment information

Output of dvc doctor:

DVC version: 3.33.3 (brew)
--------------------------
Platform: Python 3.11.6 on macOS-13.5-arm64-arm-64bit
Subprojects:
	dvc_data = 2.22.6
	dvc_objects = 1.4.9
	dvc_render = 1.0.0
	dvc_task = 0.3.0
	scmrepo = 1.5.0
Supports:
	azure (adlfs = 2023.10.0, knack = 0.11.0, azure-identity = 1.15.0),
	gdrive (pydrive2 = 1.18.0),
	gs (gcsfs = 2023.12.1),
	http (aiohttp = 3.9.1, aiohttp-retry = 2.8.3),
	https (aiohttp = 3.9.1, aiohttp-retry = 2.8.3),
	oss (ossfs = 2023.12.0),
	s3 (s3fs = 2023.12.1, boto3 = 1.33.1),
	ssh (sshfs = 2023.10.0),
	webdav (webdav4 = 0.9.8),
	webdavs (webdav4 = 0.9.8),
	webhdfs (fsspec = 2023.12.1)
Config:
	Global: /Users/neverfox/Library/Application Support/dvc
	System: /opt/homebrew/share/dvc
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: azure
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc, git
Repo.site_cache_dir: /opt/homebrew/var/cache/dvc/repo/326b7469459f67242b1221df6053081e

Additional Information (if any):

neverfox avatar Dec 08 '23 17:12 neverfox

Hi @neverfox! This is expected behavior, since DVC will respect the config stored in each commit. The easiest way to migrate to a new remote would be to use azure to copy the whole remote cache to the new location. Would that work for you?

dberenbaum avatar Dec 15 '23 13:12 dberenbaum

Closing due to lack of response.

skshetry avatar Mar 25 '24 10:03 skshetry