dvc
dvc copied to clipboard
Support virtual-hosted–style only S3-compatible remote?
Does dvc support virtual-hosted–style S3-compatible remote?
From doc, dvc seem dvc only support path-style S3 remote.
When I try to use dvc with Tinder Object Storage (TOS) which only support virtual-hosted–style S3, it report error.
- BytePlus | Compatibility with Amazon S3 link
- Virtual hosting of buckets - Amazon Simple Storage Service link
Error info:
$ cat .dvc/config
[core]
remote = tos
['remote "tos"']
url = s3://xxx/xxx (edited)
endpointurl = https://tos-s3-xxx.xxxx.com (edited)
$ dvc push
Collecting |0.00 [00:00, ?entry/s]
Pushing
ERROR: unexpected error - Forbidden: An error occurred (403) when calling the HeadObject operation: Forbidden
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
$ dvc push --verbose
2024-01-25 17:21:30,113 DEBUG: v3.42.0 (brew), CPython 3.12.1 on macOS-14.2.1-arm64-arm-64bit
2024-01-25 17:21:30,113 DEBUG: command: /opt/homebrew/bin/dvc push --verbose
Collecting |0.00 [00:00, ?entry/s]
2024-01-25 17:21:30,346 DEBUG: Preparing to transfer data from 'x'
2024-01-25 17:21:30,346 DEBUG: Preparing to collect status from 'x'
2024-01-25 17:21:30,346 DEBUG: Collecting status from 'x'
2024-01-25 17:21:30,347 DEBUG: Querying 1 oids via object_exists
Pushing
2024-01-25 17:21:30,689 ERROR: unexpected error - Forbidden: An error occurred (403) when calling the HeadObject operation: Forbidden
Traceback (most recent call last):
File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/s3fs/core.py", line 113, in _error_wrapper
return await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/aiobotocore/client.py", line 408, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/dvc/cli/__init__.py", line 211, in main
ret = cmd.do_run()
^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/dvc/cli/command.py", line 27, in do_run
return self.run()
^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/dvc/commands/data_sync.py", line 64, in run
processed_files_count = self.repo.push(
^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/dvc/repo/__init__.py", line 65, in wrapper
return f(repo, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/dvc/repo/push.py", line 144, in push
push_transferred, push_failed = ipush(
^^^^^^
File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/dvc_data/index/push.py", line 75, in push
result = transfer(
^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/dvc_data/hashfile/transfer.py", line 203, in transfer
status = compare_status(
^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/dvc_data/hashfile/status.py", line 178, in compare_status
dest_exists, dest_missing = status(
^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/dvc_data/hashfile/status.py", line 150, in status
exists.update(odb.oids_exist(hashes, jobs=jobs, progress=pbar.callback))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/dvc_objects/db.py", line 422, in oids_exist
return list(wrap_iter(remote_oids, callback))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/dvc_objects/db.py", line 36, in wrap_iter
for index, item in enumerate(iterable, start=1):
File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/dvc_objects/db.py", line 370, in list_oids_exists
in_remote = self.fs.exists(paths, batch_size=jobs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/dvc_objects/fs/base.py", line 472, in exists
return fut.result()
^^^^^^^^^^^^
File "/opt/homebrew/Cellar/[email protected]/3.12.1_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/concurrent/futures/_base.py", line 456, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/[email protected]/3.12.1_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/dvc_objects/executors.py", line 135, in batch_coros
result = fut.result()
^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/s3fs/core.py", line 1035, in _exists
await self._info(path, bucket, key, version_id=version_id)
File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/s3fs/core.py", line 1302, in _info
out = await self._call_s3(
^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/s3fs/core.py", line 348, in _call_s3
return await _error_wrapper(
^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/dvc/3.42.0/libexec/lib/python3.12/site-packages/s3fs/core.py", line 140, in _error_wrapper
raise err
PermissionError: Forbidden
2024-01-25 17:21:30,721 DEBUG: Version info for developers:
DVC version: 3.42.0 (brew)
--------------------------
Platform: Python 3.12.1 on macOS-14.2.1-arm64-arm-64bit
Subprojects:
dvc_data = 3.8.0
dvc_objects = 3.0.6
dvc_render = 1.0.1
dvc_task = 0.3.0
scmrepo = 2.0.4
Supports:
azure (adlfs = 2023.12.0, knack = 0.11.0, azure-identity = 1.15.0),
gdrive (pydrive2 = 1.19.0),
gs (gcsfs = 2023.12.2.post1),
http (aiohttp = 3.9.1, aiohttp-retry = 2.8.3),
https (aiohttp = 3.9.1, aiohttp-retry = 2.8.3),
oss (ossfs = 2023.12.0),
s3 (s3fs = 2023.12.2, boto3 = 1.34.22),
ssh (sshfs = 2023.10.0),
webdav (webdav4 = 0.9.8),
webdavs (webdav4 = 0.9.8),
webhdfs (fsspec = 2023.12.2)
Config:
Global: /Users/bytedance/Library/Application Support/dvc
System: /opt/homebrew/share/dvc
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s3s1
Caches: local
Remotes: s3
Workspace directory: apfs on /dev/disk3s3s1
Repo: dvc, git
Repo.site_cache_dir: /opt/homebrew/var/cache/dvc/repo/b8147adaf473f039b47ac961430c05d4
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2024-01-25 17:21:30,725 DEBUG: Analytics is enabled.
2024-01-25 17:21:30,755 DEBUG: Trying to spawn ['daemon', 'analytics', '/var/folders/x3/m9b0vhpn28d_557yb666lf4w0000gn/T/tmpay2t_qwd', '-v']
2024-01-25 17:21:30,762 DEBUG: Spawned ['daemon', 'analytics', '/var/folders/x3/m9b0vhpn28d_557yb666lf4w0000gn/T/tmpay2t_qwd', '-v'] with pid 98616
DVC should work properly with the virtual-host style endpointurl addressing. The error indicates that you don't have the right permissions to access that bucket. Are you able to use the AWS CLI to access that byte-plus bucket?
It looks like you have to pass addressing_style: 'virtual' to botocore to enable this?
https://github.com/boto/boto3/issues/2477
In botocore it defaults to auto, where if you are setting an endpointurl it uses virtual style and then falls back to path
https://github.com/boto/botocore/blob/e7c5b6ab22174797db551f44053a0b2245430649/botocore/utils.py#L2604-L2614
Thank you, @skshetry and @pmrowla.
When forcing boto3 to use {'addressing_style': 'virtual'}, I can access byte-plus bucket:
when not, it raises error:
What should I do to make dvc work?
The error given with s3fs/DVC is a permission error when accessing a specific file. It does not look like s3fs is not failing to create the client session (InvalidPathAccess is not the raised exception when you use DVC). This would normally mean that the issue is specifically with the credentials you are setting for your DVC remote (and not that the issue is due to an incorrect addressing style)
@0ut0fcontrol can you verify that you are able to access your bucket via the AWS CLI?
@pmrowla Yes, I can access my bucket via aws cli:
And, if I insert config_kwargs['s3'] = {'addressing_style': 'virtual'} in here, dvc will work fine.
However, I don't know how to pass this kwarg from dvc cli into s3fs.
-
add
{'addressing_style': 'virtual'} -
Comment out
{'addressing_style': 'virtual'}, dvc not work again.
This looks to me like it may be an aiobotocore or botocore bug since their behavior to default to virtual host addressing and fall back to path addressing may not be working correctly?
But we can add support for setting this explicitly in dvc-s3 dvc-s3 (we set config timeouts in the same way). It will also need to get added to the DVC remote config schema https://github.com/iterative/dvc-s3/blob/43f70226160f1a5c9ffdad4092d41a8bab7ec19b/dvc_s3/init.py#L176-L179