paws
paws copied to clipboard
Check if an S3 object exists without generating an error?
targets
uses paws
for S3 storage, and it needs to check if the object exists before knowing how to proceed. Is it possible to efficiently check the existence of an object without generating an error if the object does not exist? The existing tryCatch(..., http_400 = ...)
workaround seems okay, but http_400
could include other kinds of errors than a missing object.
Hi @wlandau, I don't think this answers your question fully but you could be more specific with the error check. Instead of using http_400
you could use http_404
. This will make sure you don't accidentally mask any 403 Forbidden
code errors (I beleive access permission falls into this).
So for example:
aws_s3_head <- function(key, bucket, region = NULL, version = NULL) {
if (!is.null(region)) {
withr::local_envvar(.new = list(AWS_REGION = region))
}
args <- list(
Key = key,
Bucket = bucket
)
if (!is.null(version)) {
args$VersionId <- version
}
do.call(what = paws::s3()$head_object, args = args)
}
aws_s3_head_true <- function(key, bucket, region = NULL, version = NULL) {
aws_s3_head(
key = key,
bucket = bucket,
region = region,
version = version
)
TRUE
}
old_aws_s3_exists <- function(key, bucket, region = NULL, version = NULL) {
tryCatch(
aws_s3_head_true(
key = key,
bucket = bucket,
region = region,
version = version
),
http_400 = function(condition) {
FALSE
}
)
}
new_aws_s3_exists <- function(key, bucket, region = NULL, version = NULL) {
tryCatch(
aws_s3_head_true(
key = key,
bucket = bucket,
region = region,
version = version
),
http_404 = function(condition) {
FALSE
}
)
}
# aws s3 bucket with iam role doesn't have permission to access
bucket = "made-up-bucket-1"
key = "made-up"
old_aws_s3_exists(key, bucket)
#> [1] FALSE
new_aws_s3_exists(key, bucket)
#> Error: SerializationError (HTTP 403). failed to read from query HTTP response body
# aws s3 object doesn't exist
bucket = "made-up-bucket-2"
key = "made-up"
old_aws_s3_exists(key, bucket)
#> [1] FALSE
new_aws_s3_exists(key, bucket)
#> [1] FALSE
Created on 2022-01-11 by the reprex package (v2.0.1)
I hope this helps 😄
Reference: boto3.client.s3.head_object
Alternatively, could call s3fs::s3_file_exists()
which will return a logical value (or give an error if the permissions prohibit access). Also it is vectorized!