paws icon indicating copy to clipboard operation
paws copied to clipboard

Check if an S3 object exists without generating an error?

Open wlandau opened this issue 2 years ago • 2 comments

targets uses paws for S3 storage, and it needs to check if the object exists before knowing how to proceed. Is it possible to efficiently check the existence of an object without generating an error if the object does not exist? The existing tryCatch(..., http_400 = ...) workaround seems okay, but http_400 could include other kinds of errors than a missing object.

wlandau avatar Jan 10 '22 19:01 wlandau

Hi @wlandau, I don't think this answers your question fully but you could be more specific with the error check. Instead of using http_400 you could use http_404 . This will make sure you don't accidentally mask any 403 Forbidden code errors (I beleive access permission falls into this).

So for example:

aws_s3_head <- function(key, bucket, region = NULL, version = NULL) {
  if (!is.null(region)) {
    withr::local_envvar(.new = list(AWS_REGION = region))
  }
  args <- list(
    Key = key,
    Bucket = bucket
  )
  if (!is.null(version)) {
    args$VersionId <- version
  }
  do.call(what = paws::s3()$head_object, args = args)
}

aws_s3_head_true <- function(key, bucket, region = NULL, version = NULL) {
  aws_s3_head(
    key = key,
    bucket = bucket,
    region = region,
    version = version
  )
  TRUE
}

old_aws_s3_exists <- function(key, bucket, region = NULL, version = NULL) {
  tryCatch(
    aws_s3_head_true(
      key = key,
      bucket = bucket,
      region = region,
      version = version
    ),
    http_400 = function(condition) {
      FALSE
    }
  )
}

new_aws_s3_exists <- function(key, bucket, region = NULL, version = NULL) {
  tryCatch(
    aws_s3_head_true(
      key = key,
      bucket = bucket,
      region = region,
      version = version
    ),
    http_404 = function(condition) {
      FALSE
    }
  )
}


# aws s3 bucket with iam role doesn't have permission to access
bucket = "made-up-bucket-1"
key = "made-up"

old_aws_s3_exists(key, bucket)
#> [1] FALSE
new_aws_s3_exists(key, bucket)
#> Error: SerializationError (HTTP 403). failed to read from query HTTP response body

# aws s3 object doesn't exist
bucket = "made-up-bucket-2"
key = "made-up"

old_aws_s3_exists(key, bucket)
#> [1] FALSE
new_aws_s3_exists(key, bucket)
#> [1] FALSE

Created on 2022-01-11 by the reprex package (v2.0.1)

I hope this helps 😄

Reference: boto3.client.s3.head_object

DyfanJones avatar Jan 11 '22 11:01 DyfanJones

Alternatively, could call s3fs::s3_file_exists() which will return a logical value (or give an error if the permissions prohibit access). Also it is vectorized!

tyner avatar Mar 05 '24 18:03 tyner