Add paginators
Hi colleagues,
It seems that I face the paginator challenge myself :)
Was trying to get my all time historical trainings:
sm_client <- paws::sagemaker(config = list(region = myregion') )
total_training_jobs <- list()
j <- 1
sequence_var <- seq.POSIXt(from = as.POSIXct("2020-04-01 00:00:00"), to=as.POSIXct("2020-11-20 00:00:00"), by="hour")
for(i in sequence_var){
total_training_jobs[[j]] <- sm_client$list_training_jobs(MaxResults=100, CreationTimeAfter = i)
j <- j+1
}
And I got a nice 400 ThrottlingException.
Anyone that has tried a workaround?
BR /E
Hey, sorry about that. I'll look into this this weekend. To my knowledge the approach to this is to delay requests some amount.
I put together this attempt at a paginator. You supply it with your AWS API call as the argument to parameter f and it will take care of fetching each page of results and returning them as a list. Below this function is an example call. Let me know if this helps or not.
# Get all pages of a given API call, retrying with exponential backoff.
paginate <- function(f, max_retries = 5) {
resp <- f
result <- list(resp)
while ("NextToken" %in% names(resp) && length(resp$NextToken) > 0 && resp$NextToken != "") {
next_token <- resp$NextToken
call <- substitute(f)
call$NextToken <- next_token
# Retry with exponential backoff.
# See https://docs.aws.amazon.com/general/latest/gr/api-retries.html.
# See also https://github.com/paws-r/paws/blob/main/examples/error_handling.R.
retry <- TRUE
retries <- 0
while (retry && retries < max_retries) {
resp <- tryCatch(eval(call), error = function(e) e)
if (inherits(resp, "error")) {
if (retries == max_retries) stop(resp)
wait_time <- 2^retries / 10
Sys.sleep(wait_time)
retries <- retries + 1
}
else retry <- FALSE
}
result <- c(result, list(resp))
}
return(result)
}
For an example, see below (using CloudWatch instead of SageMaker in my case). In your case, you'll need to modify the call to use a fixed creation time, e.g. sm_client$list_training_jobs(MaxResults=100, CreationTimeAfter = as.POSIXct("2020-04-01 00:00:00")). With a fixed creation time, the API will split the results into pages and the paginator will fetch each one (hopefully) up to the present.
results <- paginate(
cw$get_metric_data(
MetricDataQueries = metric_data_queries,
StartTime = as.POSIXct("2020-01-01"),
EndTime = as.POSIXct("2020-11-22")
)
)
Of course,
How bad of me to have overlooked the next token workaround.
The solution is working perfectly @davidkretch, thanks for that!
BR
@davidkretch @adambanker
For paginates I am toying around the idea of an apply method:
So we have the standard paginator that will loop over every token.
library(paws.common)
s3 <- paws::s3()
out <- paginate(
S3$list_objects_v2(
Bucket = "my_bucket"
)
)
Secondly we have the apply "family" of paginators that allow users to use a function on each response from the operation.
Basic example:
out <- paginate_lapply(
S3$list_objects_v2(
Bucket = "my_bucket"
),
\(resp) {
resp$Contents
}
)
What are your thoughts on this? Would like your feedback before I go too far down the rabbit's hole 😆
paws v-0.4.0 has now been released to the cran. I will close this ticket for now.