s3fs

s3fs provides a file-system like interface into Amazon Web Services for R. It utilizes paws SDKand R6 for it’s core design. This repo has been inspired by Python’s s3fs, however it’s API and implementation has been developed to follow R’s fs.

Installation

r-universe installation:

# Enable repository from dyfanjones
options(repos = c(
  dyfanjones = 'https://dyfanjones.r-universe.dev',
  CRAN = 'https://cloud.r-project.org')
)

# Download and install s3fs in R
install.packages('s3fs')

Github installation

remotes::install_github("dyfanjones/s3fs")

Dependencies

paws: connection with AWS S3
R6: Setup core class
data.table: wrangle lists into data.frames
fs: file system on local files
lgr: set up logging
future: set up async functionality
future.apply: set up parallel looping

Comparison with `fs`

s3fs attempts to give the same interface as fs when handling files on AWS S3 from R.

Vectorization. All s3fs functions are vectorized, accepting multiple path inputs similar to fs.
Predictable.
- Non-async functions return values that convey a path.
- Async functions return a future object of it’s no-async counterpart.
- The only exception will be s3_stream_in which returns a list of raw objects.
Naming conventions. s3fs functions follows fs naming conventions with dir_*, file_* and path_* however with the syntax s3_ infront i.e s3_dir_*, s3_file_* and s3_path_* etc.
Explicit failure. Similar to fs if a failure happens, then it will be raised and not masked with a warning.

Extra features:

Scalable. All s3fs functions are designed to have the option to run in parallel through the use of future and future.apply.

For example: copy a large file from one location to the next.

library(s3fs)
library(future)

plan("multisession")

s3_file_copy("s3://mybucket/multipart/large_file.csv", "s3://mybucket/new_location/large_file.csv")

s3fs to copy a large file (> 5GB) using multiparts, future allows each multipart to run in parallel to speed up the process.

Async. s3fs uses future to create a few key async functions. This is more focused on functions that might be moving large files to and from R and AWS S3.

For example: Copying a large file from AWS S3 to R.

library(s3fs)
library(future)

plan("multisession")

s3_file_copy_async("s3://mybucket/multipart/large_file.csv", "large_file.csv")

Usage

fs has a straight forward API with 4 core themes:

path_ for manipulating and constructing paths
file_ for files
dir_ for directories
link_ for links

s3fs follows theses themes with the following:

s3_path_ for manipulating and constructing s3 uri paths
s3_file_ for s3 files
s3_dir_ for s3 directories

NOTE: link_ is currently not supported.

library(s3fs)

# Construct a path to a file with `path()`
s3_path("foo", "bar", letters[1:3], ext = "txt")
#> [1] "s3://foo/bar/a.txt" "s3://foo/bar/b.txt" "s3://foo/bar/c.txt"

# list buckets
s3_dir_ls()
#> [1] "s3://MyBucket1"
#> [2] "s3://MyBucket2"                                        
#> [3] "s3://MyBucket3"               
#> [4] "s3://MyBucket4"                            
#> [5] "s3://MyBucket5"

# list files in bucket
s3_dir_ls("s3://MyBucket5")
#> [1] "s3://MyBucket5/iris.json"     "s3://MyBucket5/athena-query/"
#> [3] "s3://MyBucket5/data/"         "s3://MyBucket5/default/"     
#> [5] "s3://MyBucket5/iris/"         "s3://MyBucket5/made-up/"     
#> [7] "s3://MyBucket5/test_df/"

# create a new directory
tmp <- s3_dir_create(s3_file_temp(tmp_dir = "MyBucket5"))
tmp
#> [1] "s3://MyBucket5/filezwkcxx9q5562"

# create new files in that directory
s3_file_create(s3_path(tmp, "my-file.txt"))
#> [1] "s3://MyBucket5/filezwkcxx9q5562/my-file.txt"
s3_dir_ls(tmp)
#> [1] "s3://MyBucket5/filezwkcxx9q5562/my-file.txt"

# remove files from the directory
s3_file_delete(s3_path(tmp, "my-file.txt"))
s3_dir_ls(tmp)
#> character(0)

# remove the directory
s3_dir_delete(tmp)

^{Created on 2022-06-21 by the reprex
package (v2.0.1)}

Similar to fs, s3fs is designed to work well with the pipe.

library(s3fs)
paths <- s3_file_temp(tmp_dir = "MyBucket") |>
 s3_dir_create() |>
 s3_path(letters[1:5]) |>
 s3_file_create()
paths
#> [1] "s3://MyBucket/fileazqpwujaydqg/a"
#> [2] "s3://MyBucket/fileazqpwujaydqg/b"
#> [3] "s3://MyBucket/fileazqpwujaydqg/c"
#> [4] "s3://MyBucket/fileazqpwujaydqg/d"
#> [5] "s3://MyBucket/fileazqpwujaydqg/e"

paths |> s3_file_delete()
#> [1] "s3://MyBucket/fileazqpwujaydqg/a"
#> [2] "s3://MyBucket/fileazqpwujaydqg/b"
#> [3] "s3://MyBucket/fileazqpwujaydqg/c"
#> [4] "s3://MyBucket/fileazqpwujaydqg/d"
#> [5] "s3://MyBucket/fileazqpwujaydqg/e"

^{Created on 2022-06-22 by the reprex
package (v2.0.1)}

NOTE: all examples have be developed from fs.

Feedback wanted

Please open a Github ticket raising any issues or feature requests.

s3fs
s3fs copied to clipboard

Metadata

s3fs

Installation

Dependencies

Comparison with `fs`

Extra features:

Usage

Feedback wanted

← Metadata

Owner

Metadata

s3fs s3fs copied to clipboard

Metadata

s3fs

Installation

Dependencies

Comparison with fs

Extra features:

Usage

Feedback wanted

← Metadata

Owner

Metadata

s3fs
s3fs copied to clipboard

Comparison with `fs`