dvc icon indicating copy to clipboard operation
dvc copied to clipboard

`push`: Error when pushing to S3 `[Errno 22] Invalid Argument.: An error occurred (InvalidArgument) when calling the PutObject operation: Invalid Argument.`

Open pltrdy-spash opened this issue 7 months ago • 9 comments

Description

Using DVC and a S3 server everything works fine except pushing. [Errno 22] Invalid Argument.: An error occurred (InvalidArgument) when calling the PutObject operation: Invalid Argument.

Full log:

dvc push -v
2025-05-05 16:49:15,059 DEBUG: v3.50.0 (pip), CPython 3.10.16 on macOS-10.16-x86_64-i386-64bit
2025-05-05 16:49:15,059 DEBUG: command: /Users/pltrdy/anaconda3/envs/xx/bin/dvc push -v
Collecting                                                                                                                                                                   |0.00 [00:00,    ?entry/s]
2025-05-05 16:49:26,588 DEBUG: Preparing to transfer data from '/Users/pltrdy/xx/.dvc/cache/files/md5' to 's3://data-ia/files/md5'
2025-05-05 16:49:26,588 DEBUG: Preparing to collect status from 'data-ia/files/md5'
2025-05-05 16:49:26,791 DEBUG: Collecting status from 'data-ia/files/md5'
2025-05-05 16:49:34,254 DEBUG: Querying 21 oids via object_exists
2025-05-05 16:49:34,863 DEBUG: Querying 0 oids via object_exists                                                                                                                                       
2025-05-05 16:49:45,826 DEBUG: Estimated remote size: 458752 files                                                                                                                                     
2025-05-05 16:49:45,827 DEBUG: Large remote (255 oids < 458.752 traverse weight), using object_exists for remaining oids                                                                               
2025-05-05 16:49:45,827 DEBUG: Querying 255 oids via object_exists                                                                                                                                     
2025-05-05 16:49:47,488 DEBUG: Preparing to collect status from '/Users/pltrdy/yy/.dvc/cache/files/md5'                                                                                          
2025-05-05 16:49:47,656 DEBUG: Collecting status from '/Users/pltrdy/yy/.dvc/cache/files/md5'                                                                                                    
2025-05-05 16:50:03,701 ERROR: failed to transfer '147e909ee4a2b44dbdbbcd89fc633adc' - [Errno 22] Invalid Argument.: An error occurred (InvalidArgument) when calling the PutObject operation: Invalid Argument.                                                                                                                                                                                              
Traceback (most recent call last):                                                                                                                                                                     
  File "/Users/pltrdy/anaconda3/envs/zz/lib/python3.10/site-packages/s3fs/core.py", line 114, in _error_wrapper
    return await func(*args, **kwargs)
  File "/Users/pltrdy/anaconda3/envs/zz/lib/python3.10/site-packages/aiobotocore/client.py", line 412, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidArgument) when calling the PutObject operation: Invalid Argument.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/pltrdy/anaconda3/envs/zz/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 349, in transfer
    _try_links(
  File "/Users/pltrdy/anaconda3/envs/zz/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 281, in _try_links
    return copy(
  File "/Users/pltrdy/anaconda3/envs/zz/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 88, in copy
    return _put(
  File "/Users/pltrdy/anaconda3/envs/zz/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 161, in _put
    _put_one(from_paths[0], to_paths[0])
  File "/Users/pltrdy/anaconda3/envs/zz/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 151, in _put_one
    return to_fs.put_file(
  File "/Users/pltrdy/anaconda3/envs/zz/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 635, in put_file
    self.fs.put_file(os.fspath(from_file), to_info, callback=callback, **kwargs)
  File "/Users/pltrdy/anaconda3/envs/zz/lib/python3.10/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/Users/pltrdy/anaconda3/envs/zz/lib/python3.10/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/Users/pltrdy/anaconda3/envs/zz/lib/python3.10/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
  File "/Users/pltrdy/anaconda3/envs/zz/lib/python3.10/site-packages/s3fs/core.py", line 1266, in _put_file
    await self._call_s3(
  File "/Users/pltrdy/anaconda3/envs/zz/lib/python3.10/site-packages/s3fs/core.py", line 371, in _call_s3
    return await _error_wrapper(
  File "/Users/pltrdy/anaconda3/envs/zz/lib/python3.10/site-packages/s3fs/core.py", line 146, in _error_wrapper
    raise err
OSError: [Errno 22] Invalid Argument.

Pushing
2025-05-05 16:50:03,710 ERROR: failed to push data to the cloud - 1 files failed to upload                                                                                                             
Traceback (most recent call last):
  File "/Users/pltrdy/anaconda3/envs/zz/lib/python3.10/site-packages/dvc/commands/data_sync.py", line 64, in run
    processed_files_count = self.repo.push(
  File "/Users/pltrdy/anaconda3/envs/zz/lib/python3.10/site-packages/dvc/repo/__init__.py", line 58, in wrapper
    return f(repo, *args, **kwargs)
  File "/Users/pltrdy/anaconda3/envs/zz/lib/python3.10/site-packages/dvc/repo/push.py", line 167, in push
    raise UploadError(failed_count)
dvc.exceptions.UploadError: 1 files failed to upload

2025-05-05 16:50:03,713 DEBUG: Analytics is enabled.
2025-05-05 16:50:03,754 DEBUG: Trying to spawn ['daemon', 'analytics', '/var/folders/n7/7d_sgddx1mj99gc0k1ndbmnh0000gq/T/tmpen5beiwu', '-v']
2025-05-05 16:50:03,781 DEBUG: Spawned ['daemon', 'analytics', '/var/folders/n7/7d_sgddx1mj99gc0k1ndbmnh0000gq/T/tmpen5beiwu', '-v'] with pid 26407

Expected

Push the files

Environment information

Output of dvc doctor:

$ dvc doctor
dvc doctor
DVC version: 3.50.0 (pip)
-------------------------
Platform: Python 3.10.16 on macOS-10.16-x86_64-i386-64bit
Subprojects:
        dvc_data = 3.15.2
        dvc_objects = 5.1.0
        dvc_render = 1.0.2
        dvc_task = 0.40.2
        scmrepo = 3.3.11
Supports:
        http (aiohttp = 3.11.10, aiohttp-retry = 2.9.1),
        https (aiohttp = 3.11.10, aiohttp-retry = 2.9.1),
        s3 (s3fs = 2025.3.2, boto3 = 1.37.3)
Config:
        Global: /Users/pltrdy/Library/Application Support/dvc
        System: /Library/Application Support/dvc
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s3s1
Caches: local
Remotes: s3
Workspace directory: apfs on /dev/disk3s3s1
Repo: dvc, git
Repo.site_cache_dir: /Library/Caches/dvc/repo/41a3d4fdaaa29cbe62d754804bb53f24

pltrdy-spash avatar May 05 '25 14:05 pltrdy-spash

@pltrdy-spash are using some kind of a S3 compatible storage or regular AWS? Was this error happening before or it is something new?

shcheklein avatar May 05 '25 18:05 shcheklein

@shcheklein thanks for your response.

I'm using an S3 compatible storage from OVH. On some machine it all worked from the first shot. On Linux I had this problem also depending on DVC version and had to downgrade to make it work (I don't remember the exact versions in this case).

My colleague on Mac has not problem, I installed the same package versions (listed in dvc doctor) but does not seem to help which makes me think it might not be easy to reproduce

pltrdy-spash avatar May 06 '25 07:05 pltrdy-spash

@pltrdy-spash, please check the versions of aiobotocore and botocore and see that they match between you and your colleague.

You can try passing --pdb to above command which will drop you to the pdb debugger, where you can inspect the stack.

I'd also suggest updating dvc, we cannot really support you with dvc==3.50.2, it's almost a year old at this point.

skshetry avatar May 06 '25 09:05 skshetry

I downgraded dvc in order to check whether it could fix it.

I reproduce with 3.59.2

DVC version: 3.59.2 (pip)
-------------------------
Platform: Python 3.10.16 on macOS-10.16-x86_64-i386-64bit
Subprojects:
        dvc_data = 3.16.10
        dvc_objects = 5.1.0
        dvc_render = 1.0.2
        dvc_task = 0.40.2
        scmrepo = 3.3.9
Supports:
        http (aiohttp = 3.11.10, aiohttp-retry = 2.9.1),
        https (aiohttp = 3.11.10, aiohttp-retry = 2.9.1),
        s3 (s3fs = 2024.12.0, boto3 = 1.35.99)

and *boto* versions are:

pip freeze|grep boto
aiobotocore==2.22.0
boto3==1.35.99
botocore==1.37.3

pltrdy-spash avatar May 07 '25 14:05 pltrdy-spash

@pltrdy-spash, could you please try with the latest fsspec version?

skshetry avatar May 28 '25 12:05 skshetry

TL;DR

Same error. FYI face this issue on Mac & Ubuntu (multiple installation option explored)

Before

  • dvc doctor
DVC version: 3.59.2 (pip)
-------------------------
Platform: Python 3.10.16 on macOS-10.16-x86_64-i386-64bit
Subprojects:
        dvc_data = 3.16.10
        dvc_objects = 5.1.0
        dvc_render = 1.0.2
        dvc_task = 0.40.2
        scmrepo = 3.3.9
Supports:
        http (aiohttp = 3.11.10, aiohttp-retry = 2.9.1),
        https (aiohttp = 3.11.10, aiohttp-retry = 2.9.1),
        s3 (s3fs = 2024.12.0, boto3 = 1.35.99)
Config:
        Global: /Users/pltrdy/Library/Application Support/dvc
        System: /Library/Application Support/dvc
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s3s1
Caches: local
Remotes: s3
Workspace directory: apfs on /dev/disk3s3s1
Repo: dvc, git
Repo.site_cache_dir: /Library/Caches/dvc/repo/41a3d4fdaaa29cbe62d754804bb53f24
  • fsspec version
fsspec==2024.12.0

Commands

pip install -U fsspec s3fs # <- fsspec upgrade broke s3fs requirements so I upgraded it too 
[...]
Successfully installed fsspec-2025.5.1
Successfully installed s3fs-2025.5.1

After

  • dvc doctor

DVC version: 3.59.2 (pip)
-------------------------
Platform: Python 3.10.16 on macOS-10.16-x86_64-i386-64bit
Subprojects:
        dvc_data = 3.16.10
        dvc_objects = 5.1.0
        dvc_render = 1.0.2
        dvc_task = 0.40.2
        scmrepo = 3.3.9
Supports:
        http (aiohttp = 3.11.10, aiohttp-retry = 2.9.1),
        https (aiohttp = 3.11.10, aiohttp-retry = 2.9.1),
        s3 (s3fs = 2025.5.1, boto3 = 1.35.99)
  • fsspec
fsspec==2025.5.1

DVC Push

dvc push todel.dvc --verbose
2025-05-28 14:43:04,442 DEBUG: v3.59.2 (pip), CPython 3.10.16 on macOS-10.16-x86_64-i386-64bit
2025-05-28 14:43:04,443 DEBUG: command: /Users/pltrdy/anaconda3/envs/xxxx3/bin/dvc push todel.dvc --verbose
Collecting                                                                          |0.00 [00:00,    ?entry/s]
2025-05-28 14:43:05,912 DEBUG: Preparing to transfer data from '/Users/pltrdy/xxxx/.dvc/cache/files/md5' to 's3://yyyy/files/md5'
2025-05-28 14:43:05,912 DEBUG: Preparing to collect status from 'yyyy/files/md5'
2025-05-28 14:43:05,912 DEBUG: Collecting status from 'yyyy/files/md5'
2025-05-28 14:43:06,640 DEBUG: Querying 1 oids via object_exists
2025-05-28 14:43:07,473 DEBUG: Preparing to collect status from '/Users/pltrdy/xxxx/.dvc/cache/files/md5' 
2025-05-28 14:43:07,474 DEBUG: Collecting status from '/Users/pltrdy/xxxx/.dvc/cache/files/md5'           
2025-05-28 14:43:08,528 ERROR: failed to transfer 'a34ff804c055d29be7a10908acc5d296' - [Errno 22] Invalid Argument.: An error occurred (InvalidArgument) when calling the PutObject operation: Invalid Argument.            
Traceback (most recent call last):                                                                            
  File "/Users/pltrdy/anaconda3/envs/xxxx3/lib/python3.10/site-packages/s3fs/core.py", line 114, in _error_wrapper
    return await func(*args, **kwargs)
  File "/Users/pltrdy/anaconda3/envs/xxxx3/lib/python3.10/site-packages/aiobotocore/client.py", line 412, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidArgument) when calling the PutObject operation: Invalid Argument.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/pltrdy/anaconda3/envs/xxxx3/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 349, in transfer
    _try_links(
  File "/Users/pltrdy/anaconda3/envs/xxxx3/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 281, in _try_links
    return copy(
  File "/Users/pltrdy/anaconda3/envs/xxxx3/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 88, in copy
    return _put(
  File "/Users/pltrdy/anaconda3/envs/xxxx3/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 161, in _put
    _put_one(from_paths[0], to_paths[0])
  File "/Users/pltrdy/anaconda3/envs/xxxx3/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 151, in _put_one
    return to_fs.put_file(
  File "/Users/pltrdy/anaconda3/envs/xxxx3/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 635, in put_file
    self.fs.put_file(os.fspath(from_file), to_info, callback=callback, **kwargs)
  File "/Users/pltrdy/anaconda3/envs/xxxx3/lib/python3.10/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/Users/pltrdy/anaconda3/envs/xxxx3/lib/python3.10/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/Users/pltrdy/anaconda3/envs/xxxx3/lib/python3.10/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
  File "/Users/pltrdy/anaconda3/envs/xxxx3/lib/python3.10/site-packages/s3fs/core.py", line 1264, in _put_file
    await self._call_s3(
  File "/Users/pltrdy/anaconda3/envs/xxxx3/lib/python3.10/site-packages/s3fs/core.py", line 371, in _call_s3
    return await _error_wrapper(
  File "/Users/pltrdy/anaconda3/envs/xxxx3/lib/python3.10/site-packages/s3fs/core.py", line 146, in _error_wrapper
    raise err
OSError: [Errno 22] Invalid Argument.

Pushing
2025-05-28 14:43:08,541 ERROR: failed to push data to the cloud - 1 files failed to upload                    
Traceback (most recent call last):
  File "/Users/pltrdy/anaconda3/envs/xxxx3/lib/python3.10/site-packages/dvc/commands/data_sync.py", line 64, in run
    processed_files_count = self.repo.push(
  File "/Users/pltrdy/anaconda3/envs/xxxx3/lib/python3.10/site-packages/dvc/repo/__init__.py", line 58, in wrapper
    return f(repo, *args, **kwargs)
  File "/Users/pltrdy/anaconda3/envs/xxxx3/lib/python3.10/site-packages/dvc/repo/push.py", line 174, in push
    raise UploadError(failed_count)
dvc.exceptions.UploadError: 1 files failed to upload

2025-05-28 14:43:08,545 DEBUG: Analytics is enabled.
2025-05-28 14:43:08,602 DEBUG: Trying to spawn ['daemon', 'analytics', '/var/folders/n7/7d_sgddx1mj99gc0k1ndbmnh0000gq/T/tmp3_uq60ka', '-v']
2025-05-28 14:43:08,621 DEBUG: Spawned ['daemon', 'analytics', '/var/folders/n7/7d_sgddx1mj99gc0k1ndbmnh0000gq/T/tmp3_uq60ka', '-v'] with pid 28181
```

pltrdy-spash avatar May 28 '25 12:05 pltrdy-spash

Are you using AWS S3 or is it an s3-compatible storage?

Also, would it be possible to add a breakpoint inside s3fs and see what arguments dvc is passing?

skshetry avatar May 28 '25 12:05 skshetry

Are you using AWS S3 or is it an s3-compatible storage?

S3 compatible storage.

Also, would it be possible to add a breakpoint inside s3fs and see what arguments dvc is passing?

Could you be more specific? From the trace it seems that dvc_objects is calling fsspec that itself calls s3fs what do we want?

pltrdy-spash avatar Jun 03 '25 13:06 pltrdy-spash

Are you using AWS S3 or is it an s3-compatible storage?

S3 compatible storage.

I think the issue is happening due to integrity check that the new botocore client is doing, see

  • https://github.com/boto/boto3/issues/4392
  • https://github.com/fsspec/s3fs/issues/931

You can either use old version of botocore, or set the following envvar and see if it fixes the issue:

export AWS_REQUEST_CHECKSUM_CALCULATION=WHEN_REQUIRED   

There is also a good news. OVHCloud recently started supporting this, see:

I am not sure how OVHCloud works, but maybe this is already resolved for you.

skshetry avatar Jun 06 '25 10:06 skshetry