minioclient
minioclient copied to clipboard
More convenience wrappers (mc_aliases, read_metadata, set_alias_from_aws_env)
Here are some ideas for some more convenience wrappers:
- a function to read metadata using the --json flag for different targets (object types), providing results as a data frame
- a function that lists aliases both from "mc alias" and from any MC_HOST_*-environment variables that are available to the client
- a function
set_alias_from_aws_env
similar tomc_alias_set
which also is aware of the AWS_SESSION_TOKEN and the new AWS_ENDPOINT_URL
Metadata as a data frame
mc stat
can provide metadata about objects given the target specification (files, buckets/folders or aliases).
This PR provides a fcn read_metadata()
... named like that because it is not a "straight mc command" in the sense that it wraps a "mc" call using the --json flag and provides return values in data frame format, but then branches depending on the type of the target specification (invalid/bucket/file/alias).
If provided with a target pointing to an object/file, it provides info about related metadata such as content type:
> "play/backup/backups-db/test.sql" |> read_metadata() |> dplyr::filter(grepl("metadata", property))
Metadata for play/backup/backups-db/test.sql (object)
# A tibble: 1 × 2
property value
<chr> <chr>
1 metadata.Content-Type application/x-sql
If provided a target which points to a bucket it gives metadata about that bucket:
> read_metadata("play/asiatrip")
Metadata for play/asiatrip (bucket)
# A tibble: 42 × 2
property value
<chr> <chr>
1 status success
2 name asiatrip/
3 lastModified 2023-08-07T09:45:46.393678849+02:00
4 size 0
5 location us-east-1
6 Versioning.status NA
7 Versioning.MFADelete NA
8 ObjectLock.enabled NA
9 ObjectLock.mode NA
10 ObjectLock.validity NA
# ℹ 32 more rows
# ℹ Use `print(n = ...)` to see more rows
If given an alias it provides info about buckets/folders available under this target.
> mc_metadata("play")
# A tibble: 58 × 6
status name lastModified size etag type
* <chr> <chr> <chr> <int> <chr> <chr>
1 success 000/ 1970-01-01T01:00:00+01:00 0 "" folder
2 success 2063b651-92a3-4a20-a4a5-03a96e7c5a89/ 1970-01-01T01:00:00+01:00 0 "" folder
3 success aihub/ 1970-01-01T01:00:00+01:00 0 "" folder
4 success asiatrip/ 1970-01-01T01:00:00+01:00 0 "" folder
5 success atomic/ 1970-01-01T01:00:00+01:00 0 "" folder
6 success avastars-audio-bucket/ 1970-01-01T01:00:00+01:00 0 "" folder
7 success backup/ 1970-01-01T01:00:00+01:00 0 "" folder
8 success bpsoft/ 1970-01-01T01:00:00+01:00 0 "" folder
9 success bsc-archive/ 1970-01-01T01:00:00+01:00 0 "" folder
10 success bucket/ 1970-01-01T01:00:00+01:00 0 "" folder
# ℹ 48 more rows
# ℹ Use `print(n = ...)` to see more rows
If the target points to a local object, the return value looks like this:
> read_metadata(".") |> dplyr::filter(grepl("metadata", property))
Metadata for . (bucket)
# A tibble: 2 × 2
property value
<chr> <chr>
1 metadata.Content-Type application/octet-stream
2 metadata.X-Amz-Meta-Mc-Attrs atime:1691392859#727999796/gid:1000/gname:markus/mode:16893/mtime…
Listing all connections/aliases
I believe that currently the "mc_alias" function does not list the MC_HOST_* aliases that are being set through system environment variables? I think this is convenient to use those MC_HOST_* aliases, for example in a container with the minio client installed, you just need to set one of those MC_HOST_* env variables and then you are ready to read and write immediately against the source without even needing to run any "mc alias set" commands first.
The mc_aliases()
function adds this functionality, ie listing of MC_HOST_* aliases, but not by writing to the config file first, but by parsing local system environment variables and adding results those to the "mc alias" listings.
Picking up new AWS "standardized" environment variables
For working with AWS environment variables such as those used with the "aws s3" command line tools, there is a function that allows picking up the new "AWS_ENDPOINT_URL" variable (recently "standardized" after 7-8 years, see https://docs.aws.amazon.com/sdkref/latest/guide/feature-ss-endpoints.html) and converts it into a MC_HOST_*-style "alias" string:
parse_aws_env() |> setNames(nm = "my_aws_s3") |> parse_mc_host_env(show_secret = FALSE)
The helper function set_alias_from_aws_env() can do this in one step from the current AWS variables specified on the host, similar to what mc_alias_set()
already does but also picking up the AWS_ENDPOINT_URL and any session token that may be set.
The "unset_mc_env()" can remove such an MC_HOST_*-setting from the system environment:
set_alias_from_aws_env("aws")
mc_ls("aws")
unset_mc_env("aws")
These ideas are still draftish (no tests added, not much docs) ... I wanted some feedback first before doing anything else. Maybe I should put this PR as a "draft"?
Thanks, this looks interesting! Still need to give it a closer read, but I like the ideas.
I'm a bit nervous about how best to go about expanding the namespace. I see the logic in adding these behaviors under new, non-mc
-prefixed functions since they don't have 1:1 map to the mc cli tool, but also wondering if that makes them more confusing or harder to discover? Specifically:
-
regarding
MC_HOST_*
variables and support forAWS_SESSION_TOKEN
andAWS_ENDPOINT_URL
, I'm wondering if it wouldn't be better to roll these in as additional optional arguments tomc_alias_*
methods? e.g.mc_alias_ls(all=TRUE)
or something like that to include those defined by env vars? -
minor quibble, but
set_alias_from_aws_env
is just a bit of a mouthful; I think on balance we're better leaving the user to manage their env vars how they see fit (Sys.setenv, .Renviron, etc). -
I think you have
mc_metadata()
above in some places where you meantread_metadata()
? Conceptually trying to figure out where this fits in best -- might make more sense as a data.frame version ofmc_stat()
, similar to the data.frame version ofmc_ls()
you've added? Could potentially have optional arguments onmc_stat()
to opt in to the extra handling regarding target type? (does that make sense? I'm still wrapping my head around that use case).
p.s. v0.0.4 is now on CRAN, thanks!
Thanks for the feedback! Slow response here (sorry!), I'm currently on vacation (and will be back in one week).
I will refactor based on the feedback you provided and add some tests and documentation as well.
It looked like install_mc()
might in some cases run into a 60 s timeout during the download process (or it might have been glitching on my system, perhaps my network connection is slower than normal). In any case, I was getting a 1.2 Mb file instead of a 25 Mb binary (kind of like an interrupted partial download maybe?). I tried to make some changes to "install_mc()" to up the timeout and I think this might have resolved some of the issues I had (also with some kind of file locking issue that popped up saying "Text file busy"). See below for some more details:
# inspect the downloaded mc binary
$ /tmp/Rtmp1FRmo0/R/minioclient/mc
bash: /tmp/Rtmp1FRmo0/R/minioclient/mc: Permission denied
$ file /tmp/Rtmp1FRmo0/R/minioclient/mc
/tmp/Rtmp1FRmo0/R/minioclient/mc: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, too large section header offset 17871072
$ ls -lahtr /tmp/Rtmp1FRmo0/R/minioclient/mc
-rw-rw-r-- 1 markus markus 1,2M Aug 16 05:50 /tmp/Rtmp1FRmo0/R/minioclient/mc
# compare to mc installed on system already
$ which mc
/home/markus/bin/mc
$ file /home/markus/bin/mc
/home/markus/bin/mc: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=Lr0J6O5pFxw7RMoUF9sw/xqy0ePJpkB0SFkvv2bze/opBNPd4RwmihK4KXj-xZ/hcqsVHzxwQYhnZdHUIuY, stripped
# do a manual download
$ mkdir temp
$ cd temp
$ wget https://dl.min.io/client/mc/release/linux-amd64/mc
$ chmod 644 mc
$ file mc
mc: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=4TaE2-EdKzib64JJCWTb/kg6_xhX3UEvvNqjQjczh/4RjqbnbC4bASq6C5GioW/xHv-Hji5GPt5wOj_ToVx, stripped
# could it be a timeout when downloading causing the problem?
# try to change timeout in the install_mc()-function
$ file /tmp/Rtmp1FRmo0/R/minioclient/mc
/tmp/Rtmp1FRmo0/R/minioclient/mc: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=4TaE2-EdKzib64JJCWTb/kg6_xhX3UEvvNqjQjczh/4RjqbnbC4bASq6C5GioW/xHv-Hji5GPt5wOj_ToVx, stripped
$ /tmp/Rtmp1FRmo0/R/minioclient/mc
bash: /tmp/Rtmp1FRmo0/R/minioclient/mc: Text file busy
# hmmm... another problem, see https://stackoverflow.com/questions/16764946/what-generates-the-text-file-busy-message-in-unix
$ lsof /tmp/Rtmp1FRmo0/R/minioclient/mc
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
rsession 3618 markus 41w REG 253,1 26382336 22151211 /tmp/Rtmp1FRmo0/R/minioclient/mc
@mskyttner apologies for the interruption here. We're back on CRAN (needed special permission to have a package which installs a binary from the web, also needed to be more careful about tempdir and config dir).
-
Can you open a new issue or PR regarding the mc_install timeout? Haven't hit that but usually working on high-bandwidth machines. I believe the standard thing would be to also download the corresponding checksum (e.g. see https://dl.min.io/client/mc/release/linux-amd64/) and confirm the checksum matches. (Could also be changed due to a man-in-middle attack, etc! Though with timeouts, the download should error if the downloaded size doesn't match the size in the http header I think?). Anyway CRAN would probably be happier if we verified checksums of binaries too.
-
Where did we get with this PR? It looks good but we've accumulated a lot of unrelated changes here I think? Is it possible to break this up a bit into multiple PRs?