googleCloudStorageR icon indicating copy to clipboard operation
googleCloudStorageR copied to clipboard

Add parallel option to `gcs_copy_object()`

Open tkoncz opened this issue 5 years ago • 3 comments

Currently (as I understand) gcs_copy_object() only allows a sequential copy, by looping over a list of objects and calling the copy on them separately.

However, the gsutil CLI allows a multi-threaded copy with the -m parameter: https://cloud.google.com/storage/docs/gsutil/commands/cp#description

Do you think this could be implemented in the R package?

If so, let me know if you need help with this :)

Thanks, Tamas

tkoncz avatar Oct 05 '20 13:10 tkoncz

Thanks for raising the issue, I was thinking of doing this via future but that uses up CPU cores and since have found curl::curl_fetch_multi() which is much better but means using curl instead of httr under the hood which is a bit more complicated. I have an implementation for Cloud Run URLs that could be used as a basis though:

https://github.com/MarkEdmondson1234/googleCloudRunner/blob/b0db74e914b81737c57556216104f782f6313d4b/R/jwt-requests.R#L124-L154

cr_jwt_async <- function(urls, token, ...){


  failure <- function(str){
    cat(paste("Failed request:", str), file = stderr())
  }


  results <- list()
  success <- function(x){
    if(x$status_code == 200){
      results <<- append(results, list(rawToChar(x$content)))
    } else {
      myMessage(x$status_code, "failure for request", x$url, level = 3)
    }


  }
  pool <- new_pool()


  lapply(urls, function(x){
    myMessage("Calling asynch: ", x, level = 3)
    h <- new_handle(url = x, ...)
    h <- cr_jwt_with_curl(h = h, token = token)
    curl_fetch_multi(x,
                     done = success, fail = failure,
                     handle = h, pool = pool)
  })


  multi_run(pool = pool)


  results


}

If you want to take a look at it that would be very welcome :)

If done then it should be done in such as manner as all GCS function operations benefit, and possibly even pulling it up to googleAuthR so all libraries have access to it.

MarkEdmondson1234 avatar Oct 06 '20 07:10 MarkEdmondson1234

Thanks Mark! I'll take a look when I have the chance, and will let you know how it goes :)

tkoncz avatar Oct 12 '20 11:10 tkoncz

Has this ever come to fruition? The multithreading option -m would be phenomenal.

nturaga avatar Dec 16 '22 21:12 nturaga