boto3 icon indicating copy to clipboard operation
boto3 copied to clipboard

`FileNotFoundError` when using `download_file` method of s3 client object.

Open nick-pestell opened this issue 2 years ago • 9 comments

Describe the bug

When running download_file method of an s3 client object I am receiving this very unexpected error (see below traceback). Within the inner workings of boto it seems to be looking for a file at destination path, plus some random characters appended to the end (.6E57BFFa).

Strangely this code was working yesterday.

I've also tested in a minimal example outside of the code base and the same error persists.

I've tested with multiple files in multiple buckets.

Expected Behavior

I expected the download_file method to work as designed without throwing this error.

Current Behavior

  File "/home/nickpestell/apha-csu/repos/btb-phylo/utils.py", line 117, in s3_download_file
    s3.download_file(bucket, key, dest)
  File "/home/nickpestell/python-virtualenvs/btb-phylo/lib/python3.8/site-packages/boto3/s3/inject.py", line 190, in download_file
    return transfer.download_file(
  File "/home/nickpestell/python-virtualenvs/btb-phylo/lib/python3.8/site-packages/boto3/s3/transfer.py", line 320, in download_file
    future.result()
  File "/home/nickpestell/python-virtualenvs/btb-phylo/lib/python3.8/site-packages/s3transfer/futures.py", line 103, in result
    return self._coordinator.result()
  File "/home/nickpestell/python-virtualenvs/btb-phylo/lib/python3.8/site-packages/s3transfer/futures.py", line 266, in result
    raise self._exception
  File "/home/nickpestell/python-virtualenvs/btb-phylo/lib/python3.8/site-packages/s3transfer/tasks.py", line 139, in __call__
    return self._execute_main(kwargs)
  File "/home/nickpestell/python-virtualenvs/btb-phylo/lib/python3.8/site-packages/s3transfer/tasks.py", line 162, in _execute_main
    return_value = self._main(**kwargs)
  File "/home/nickpestell/python-virtualenvs/btb-phylo/lib/python3.8/site-packages/s3transfer/download.py", line 642, in _main
    fileobj.seek(offset)
  File "/home/nickpestell/python-virtualenvs/btb-phylo/lib/python3.8/site-packages/s3transfer/utils.py", line 378, in seek
    self._open_if_needed()
  File "/home/nickpestell/python-virtualenvs/btb-phylo/lib/python3.8/site-packages/s3transfer/utils.py", line 361, in _open_if_needed
    self._fileobj = self._open_function(self._filename, self._mode)
  File "/home/nickpestell/python-virtualenvs/btb-phylo/lib/python3.8/site-packages/s3transfer/utils.py", line 272, in open
    return open(filename, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/fsx-042/share/phyloConsensus/phyloConsensus/AF-12-01538-15.fas.6E57BFFa'

Reproduction Steps

I expect this will not reproduce but I am simply running:

s3 = boto3.client('s3')
s3.download_file(<bucket>, <key>, <dest>)

where bucket, key and dest are all correctly formatted.

Possible Solution

No response

Additional Information/Context

No response

SDK version used

'1.24.28'

Environment details (OS name and version, etc.)

ubuntu 20.04

nick-pestell avatar Jul 13 '22 12:07 nick-pestell

Hi @nick-pestell thanks for reaching out. This appears to overlap with a recent issue: https://github.com/boto/boto3/issues/3319. As mentioned there, this is likely an issue with your path/sub-directories. Can you review this comment and let me know if the posts referenced there help clarify the behavior you are seeing?

tim-finnigan avatar Jul 13 '22 15:07 tim-finnigan

Thanks @tim-finnigan . I think I broadly understand the issue that these other users are facing, although I'm not sure that this applies here. I have confirmed that the key and bucket I'm using exist and are formatted correctly. I've even confirmed that the object exists by passing the key and bucket to the following function:

def s3_object_exists(bucket, key):
    """
        Returns true if the S3 key is in the S3 bucket. False otherwise
        Thanks: https://stackoverflow.com/questions/33842944/check-if-a-key-exists-in-a-bucket-in-s3-using-boto3
    """
    key_exists = True
    s3 = boto3.resource('s3')
    try:
        s3.Object(bucket, key).load()
    except botocore.exceptions.ClientError as e:
        if e.response['Error']['Code'] == "404":
            # The object does not exist.
            key_exists = False
        else:
            # Something else has gone wrong.
            raise e
    return  key_exists
    ```

nick-pestell avatar Jul 13 '22 15:07 nick-pestell

Update, this is working again today, without any changes to the code. I'm not sure I understand how this can be an issue with the s3 keys if it works sometimes and not others...

nick-pestell avatar Jul 14 '22 09:07 nick-pestell

It has again started throwing the error, without any change to our code or s3_keys.

nick-pestell avatar Jul 14 '22 13:07 nick-pestell

Thanks @nick-pestell for following up. Does the sub-directory destination path exist that you're trying to download to? You can use an approach like this to make sure the directories are created:

import boto3
s3=boto3.client('s3')
list=s3.list_objects(Bucket='bucket')['Contents']
for s3_key in list:
    s3_object = s3_key['Key']
    if not s3_object.endswith("/"):
        s3.download_file('bucket', s3_object, s3_object)
    else:
        import os
        if not os.path.exists(s3_object):
            os.makedirs(s3_object)

tim-finnigan avatar Jul 14 '22 17:07 tim-finnigan

Thanks @tim-finnigan. Yes the the sub-folder to where I'm attempting to download the file to does exist, and the s3 object is definitely a file (it doesn't end with /). I don't know if it's useful to know but when this error occurs I'm forced to switch to calling the AWS cli from within python (via a subprocess). This is working fine (although significantly slower), again suggesting that the bug is probably caused by something within boto3/s3transfer?

nick-pestell avatar Jul 15 '22 08:07 nick-pestell

@tim-finnigan, do you have anymore ideas? I'm relatively convinced that this is a bug, but happy to explore other options if you have any ideas.

nick-pestell avatar Jul 19 '22 11:07 nick-pestell

Hi @nick-pestell could you provide the debug logs by adding boto3.set_stream_logger('') to your script (with any sensitive info redacted)? That would help give us more insight into the behavior you're seeing.

tim-finnigan avatar Jul 19 '22 17:07 tim-finnigan

Thanks @tim-finnigan. Frustratingly, its working right now. I will try again over the next few days to see if I can catch it at a time when the above error occurs and send the log.

nick-pestell avatar Jul 25 '22 10:07 nick-pestell

@tim-finnigan

I ran into the same issue - However, I figured out what was happening in my situation.

When listing subdirectories, if I had created it manually on the s3 console, it is returned as an object when doing list_objects. However, when a folder was created using boto3 when uploading a file with an intermediate subdirectory, afterward, list_objects does not show the folders created when using the boto3 upload method.

My guess is that this is a bug and when folders are created while using the boto3 upload method, they're not added as objects to the s3 bucket.

More details:

These are the contents of my S3 bucket: Screenshot 2022-10-28 at 4 50 57 PM

When I ran the following snippet:

for obj_metadata in s3.list_objects(Bucket="faiyaz-file-transfer-test")["Contents"]:
    print(obj_metadata["Key"])

the output I get is

.DS_Store
another-folder/
another-folder/hello.rtf
goodbye.rtf
hello.rtf
mango-folder/
subdir/goodbye.rtf
untitled folder/goodbye.rtf

The folders subdir and untitled folder were created when using boto3 to upload the last two files shown on the list above. You'll notice that they're not listed when list_objects is called. The folders another-folder and mango-folder do show up, but they were both created via the console.

FyzHsn avatar Oct 28 '22 20:10 FyzHsn

Greetings! It looks like this issue hasn’t been active in longer than five days. We encourage you to check if this is still an issue in the latest release. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or upvote with a reaction on the initial post to prevent automatic closure. If the issue is already closed, please feel free to open a new one.

github-actions[bot] avatar Nov 22 '22 01:11 github-actions[bot]

@FyzHsn the behavior you're describing (a folder object is created if you use the S3 console to create folder/, but a folder object does not get created if an object is uploaded to folder/dog.png via boto3) is standard behavior across all S3 SDKs, not a bug. In the general case, there are no folder objects in S3 at all. An object folder/dog.png can exist without an object folder/ existing. Folders are, in a sense, virtual and inferred from key prefixes in existing objects.

john-aws avatar Nov 23 '22 15:11 john-aws

This bug is still present after following all the suggestions made above for boto3==1.26.90. Please provide some more guidance.

adityamcodes avatar Oct 05 '23 05:10 adityamcodes

It's not sure but in my case, it seems to be raised when I try to save file to directory that not exists.

suppose that our code is:

bucket.download_file("my-s3-key", "foo/myfile.txt")

and if the foo directory not exists, the error is raised.

you should mkdir foo directory before download from s3. give it a try.

seyoongit avatar Jan 24 '24 15:01 seyoongit