Add parallel option to `gcs_copy_object()`
Currently (as I understand) gcs_copy_object() only allows a sequential copy, by looping over a list of objects and calling the copy on them separately.
However, the gsutil CLI allows a multi-threaded copy with the -m parameter:
https://cloud.google.com/storage/docs/gsutil/commands/cp#description
Do you think this could be implemented in the R package?
If so, let me know if you need help with this :)
Thanks, Tamas
Thanks for raising the issue, I was thinking of doing this via future but that uses up CPU cores and since have found curl::curl_fetch_multi() which is much better but means using curl instead of httr under the hood which is a bit more complicated. I have an implementation for Cloud Run URLs that could be used as a basis though:
https://github.com/MarkEdmondson1234/googleCloudRunner/blob/b0db74e914b81737c57556216104f782f6313d4b/R/jwt-requests.R#L124-L154
cr_jwt_async <- function(urls, token, ...){
failure <- function(str){
cat(paste("Failed request:", str), file = stderr())
}
results <- list()
success <- function(x){
if(x$status_code == 200){
results <<- append(results, list(rawToChar(x$content)))
} else {
myMessage(x$status_code, "failure for request", x$url, level = 3)
}
}
pool <- new_pool()
lapply(urls, function(x){
myMessage("Calling asynch: ", x, level = 3)
h <- new_handle(url = x, ...)
h <- cr_jwt_with_curl(h = h, token = token)
curl_fetch_multi(x,
done = success, fail = failure,
handle = h, pool = pool)
})
multi_run(pool = pool)
results
}
If you want to take a look at it that would be very welcome :)
If done then it should be done in such as manner as all GCS function operations benefit, and possibly even pulling it up to googleAuthR so all libraries have access to it.
Thanks Mark! I'll take a look when I have the chance, and will let you know how it goes :)
Has this ever come to fruition? The multithreading option -m would be phenomenal.