paws icon indicating copy to clipboard operation
paws copied to clipboard

s3 put without loading binary to memory

Open yonicd opened this issue 5 years ago • 8 comments

From what I have seen thus far if I want to put an object onto a bucket in s3 i need to load it to memory first.

library(paws)
svc <- paws::s3(
  config = list(
    credentials = list(
      creds = list(
        access_key_id = Sys.getenv('AWS_ACCESS_KEY_ID'),
        secret_access_key = Sys.getenv('AWS_SECRET_ACCESS_KEY')
      )
    ),
    region = Sys.getenv('AWS_DEFAULT_REGION')
  )
)

x <- paste0(readLines(PATH_TO_FILE),collapse = '\n')
res <- svc$put_object(Body = x,Bucket = MY_BUCKET,Key = MY_KEY)

Is there a way to give put_* a path instead of an object?

Apologies in advanced if this is documented somewhere.

yonicd avatar Feb 26 '20 15:02 yonicd

I have used something similar to this in the past. https://github.com/cloudyr/aws.s3/blob/master/R/put_object.R

yonicd avatar Feb 26 '20 17:02 yonicd

Unfortunately we don't have that yet. We're working on it, but I can't say when it'll be available.

davidkretch avatar Feb 26 '20 19:02 davidkretch

ok. thank you

yonicd avatar Feb 26 '20 20:02 yonicd

It's too bad it's not really documented too. I've spent quite a lot of time trying to save a parquet file into a bucket. It seems like you also unassigned yourself, @davidkretch - does it mean it won't be implemented?

DavidArenburg avatar Aug 03 '20 08:08 DavidArenburg

I can't say when it's going to happen unfortunately. The person who was originally working on this started a new job. I thought I had the time for it but it has turned out not to be the case recently :-(. I'll see if I can dig up an example at least this weekend though.

davidkretch avatar Aug 03 '20 23:08 davidkretch

Hello @yonicd and @DavidArenburg, sorry for the delay. There is now an example of using multipart upload to read in and upload files in 5 MB parts here: examples/s3_multipart_upload.R.

This may eventually be added to the package but I'm not sure yet. Please let me know if you have any issues. Thank you!

davidkretch avatar Aug 09 '20 17:08 davidkretch

What if Body could also be an object returned by httr::upload_file()? This doesn't help for very large files that still need multipart uploads, but does eliminate streaming data through R.

hadley avatar May 19 '21 11:05 hadley

(I'm also happy to look into implementing this, but I'd need some guidelines on how you think about R specific extensions/wrappers to aws APIs)

hadley avatar May 19 '21 11:05 hadley