minioclient icon indicating copy to clipboard operation
minioclient copied to clipboard

More convenience wrappers (mc_aliases, read_metadata, set_alias_from_aws_env)

Open mskyttner opened this issue 1 year ago • 5 comments

Here are some ideas for some more convenience wrappers:

  • a function to read metadata using the --json flag for different targets (object types), providing results as a data frame
  • a function that lists aliases both from "mc alias" and from any MC_HOST_*-environment variables that are available to the client
  • a function set_alias_from_aws_env similar to mc_alias_set which also is aware of the AWS_SESSION_TOKEN and the new AWS_ENDPOINT_URL

Metadata as a data frame

mc stat can provide metadata about objects given the target specification (files, buckets/folders or aliases).

This PR provides a fcn read_metadata() ... named like that because it is not a "straight mc command" in the sense that it wraps a "mc" call using the --json flag and provides return values in data frame format, but then branches depending on the type of the target specification (invalid/bucket/file/alias).

If provided with a target pointing to an object/file, it provides info about related metadata such as content type:

> "play/backup/backups-db/test.sql" |> read_metadata() |> dplyr::filter(grepl("metadata", property))
Metadata for play/backup/backups-db/test.sql (object)
# A tibble: 1 × 2
  property              value            
  <chr>                 <chr>            
1 metadata.Content-Type application/x-sql

If provided a target which points to a bucket it gives metadata about that bucket:

> read_metadata("play/asiatrip")
Metadata for play/asiatrip (bucket)
# A tibble: 42 × 2
   property             value                              
   <chr>                <chr>                              
 1 status               success                            
 2 name                 asiatrip/                          
 3 lastModified         2023-08-07T09:45:46.393678849+02:00
 4 size                 0                                  
 5 location             us-east-1                          
 6 Versioning.status    NA                                 
 7 Versioning.MFADelete NA                                 
 8 ObjectLock.enabled   NA                                 
 9 ObjectLock.mode      NA                                 
10 ObjectLock.validity  NA                                 
# ℹ 32 more rows
# ℹ Use `print(n = ...)` to see more rows

If given an alias it provides info about buckets/folders available under this target.

> mc_metadata("play")
# A tibble: 58 × 6
   status  name                                  lastModified               size etag  type  
 * <chr>   <chr>                                 <chr>                     <int> <chr> <chr> 
 1 success 000/                                  1970-01-01T01:00:00+01:00     0 ""    folder
 2 success 2063b651-92a3-4a20-a4a5-03a96e7c5a89/ 1970-01-01T01:00:00+01:00     0 ""    folder
 3 success aihub/                                1970-01-01T01:00:00+01:00     0 ""    folder
 4 success asiatrip/                             1970-01-01T01:00:00+01:00     0 ""    folder
 5 success atomic/                               1970-01-01T01:00:00+01:00     0 ""    folder
 6 success avastars-audio-bucket/                1970-01-01T01:00:00+01:00     0 ""    folder
 7 success backup/                               1970-01-01T01:00:00+01:00     0 ""    folder
 8 success bpsoft/                               1970-01-01T01:00:00+01:00     0 ""    folder
 9 success bsc-archive/                          1970-01-01T01:00:00+01:00     0 ""    folder
10 success bucket/                               1970-01-01T01:00:00+01:00     0 ""    folder
# ℹ 48 more rows
# ℹ Use `print(n = ...)` to see more rows

If the target points to a local object, the return value looks like this:

> read_metadata(".") |> dplyr::filter(grepl("metadata", property))
Metadata for . (bucket)
# A tibble: 2 × 2
  property                     value                                                             
  <chr>                        <chr>                                                             
1 metadata.Content-Type        application/octet-stream                                          
2 metadata.X-Amz-Meta-Mc-Attrs atime:1691392859#727999796/gid:1000/gname:markus/mode:16893/mtime…

Listing all connections/aliases

I believe that currently the "mc_alias" function does not list the MC_HOST_* aliases that are being set through system environment variables? I think this is convenient to use those MC_HOST_* aliases, for example in a container with the minio client installed, you just need to set one of those MC_HOST_* env variables and then you are ready to read and write immediately against the source without even needing to run any "mc alias set" commands first.

The mc_aliases() function adds this functionality, ie listing of MC_HOST_* aliases, but not by writing to the config file first, but by parsing local system environment variables and adding results those to the "mc alias" listings.

Picking up new AWS "standardized" environment variables

For working with AWS environment variables such as those used with the "aws s3" command line tools, there is a function that allows picking up the new "AWS_ENDPOINT_URL" variable (recently "standardized" after 7-8 years, see https://docs.aws.amazon.com/sdkref/latest/guide/feature-ss-endpoints.html) and converts it into a MC_HOST_*-style "alias" string:

parse_aws_env() |> setNames(nm = "my_aws_s3") |> parse_mc_host_env(show_secret = FALSE)

The helper function set_alias_from_aws_env() can do this in one step from the current AWS variables specified on the host, similar to what mc_alias_set() already does but also picking up the AWS_ENDPOINT_URL and any session token that may be set.

The "unset_mc_env()" can remove such an MC_HOST_*-setting from the system environment:

set_alias_from_aws_env("aws")
mc_ls("aws")
unset_mc_env("aws")

mskyttner avatar Aug 07 '23 09:08 mskyttner

These ideas are still draftish (no tests added, not much docs) ... I wanted some feedback first before doing anything else. Maybe I should put this PR as a "draft"?

mskyttner avatar Aug 07 '23 10:08 mskyttner

Thanks, this looks interesting! Still need to give it a closer read, but I like the ideas.

I'm a bit nervous about how best to go about expanding the namespace. I see the logic in adding these behaviors under new, non-mc-prefixed functions since they don't have 1:1 map to the mc cli tool, but also wondering if that makes them more confusing or harder to discover? Specifically:

  • regarding MC_HOST_* variables and support for AWS_SESSION_TOKEN and AWS_ENDPOINT_URL, I'm wondering if it wouldn't be better to roll these in as additional optional arguments to mc_alias_* methods? e.g. mc_alias_ls(all=TRUE) or something like that to include those defined by env vars?

  • minor quibble, but set_alias_from_aws_env is just a bit of a mouthful; I think on balance we're better leaving the user to manage their env vars how they see fit (Sys.setenv, .Renviron, etc).

  • I think you have mc_metadata() above in some places where you meant read_metadata()? Conceptually trying to figure out where this fits in best -- might make more sense as a data.frame version of mc_stat(), similar to the data.frame version of mc_ls() you've added? Could potentially have optional arguments on mc_stat() to opt in to the extra handling regarding target type? (does that make sense? I'm still wrapping my head around that use case).

p.s. v0.0.4 is now on CRAN, thanks!

cboettig avatar Aug 09 '23 18:08 cboettig

Thanks for the feedback! Slow response here (sorry!), I'm currently on vacation (and will be back in one week).

I will refactor based on the feedback you provided and add some tests and documentation as well.

mskyttner avatar Aug 15 '23 16:08 mskyttner

It looked like install_mc() might in some cases run into a 60 s timeout during the download process (or it might have been glitching on my system, perhaps my network connection is slower than normal). In any case, I was getting a 1.2 Mb file instead of a 25 Mb binary (kind of like an interrupted partial download maybe?). I tried to make some changes to "install_mc()" to up the timeout and I think this might have resolved some of the issues I had (also with some kind of file locking issue that popped up saying "Text file busy"). See below for some more details:

# inspect the downloaded mc binary

$ /tmp/Rtmp1FRmo0/R/minioclient/mc
bash: /tmp/Rtmp1FRmo0/R/minioclient/mc: Permission denied

$ file /tmp/Rtmp1FRmo0/R/minioclient/mc
/tmp/Rtmp1FRmo0/R/minioclient/mc: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, too large section header offset 17871072

$ ls -lahtr /tmp/Rtmp1FRmo0/R/minioclient/mc
-rw-rw-r-- 1 markus markus 1,2M Aug 16 05:50 /tmp/Rtmp1FRmo0/R/minioclient/mc


# compare to mc installed on system already
$ which mc
/home/markus/bin/mc

$ file /home/markus/bin/mc
/home/markus/bin/mc: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=Lr0J6O5pFxw7RMoUF9sw/xqy0ePJpkB0SFkvv2bze/opBNPd4RwmihK4KXj-xZ/hcqsVHzxwQYhnZdHUIuY, stripped


# do a manual download
$ mkdir temp
$ cd temp
$ wget https://dl.min.io/client/mc/release/linux-amd64/mc
$ chmod 644 mc 
$ file mc
mc: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=4TaE2-EdKzib64JJCWTb/kg6_xhX3UEvvNqjQjczh/4RjqbnbC4bASq6C5GioW/xHv-Hji5GPt5wOj_ToVx, stripped



# could it be a timeout when downloading causing the problem?
# try to change timeout in the install_mc()-function

$ file /tmp/Rtmp1FRmo0/R/minioclient/mc
/tmp/Rtmp1FRmo0/R/minioclient/mc: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=4TaE2-EdKzib64JJCWTb/kg6_xhX3UEvvNqjQjczh/4RjqbnbC4bASq6C5GioW/xHv-Hji5GPt5wOj_ToVx, stripped

$ /tmp/Rtmp1FRmo0/R/minioclient/mc
bash: /tmp/Rtmp1FRmo0/R/minioclient/mc: Text file busy

# hmmm... another problem, see https://stackoverflow.com/questions/16764946/what-generates-the-text-file-busy-message-in-unix

$ lsof /tmp/Rtmp1FRmo0/R/minioclient/mc 
COMMAND   PID   USER   FD   TYPE DEVICE SIZE/OFF     NODE NAME
rsession 3618 markus   41w   REG  253,1 26382336 22151211 /tmp/Rtmp1FRmo0/R/minioclient/mc

mskyttner avatar Aug 16 '23 06:08 mskyttner

@mskyttner apologies for the interruption here. We're back on CRAN (needed special permission to have a package which installs a binary from the web, also needed to be more careful about tempdir and config dir).

  • Can you open a new issue or PR regarding the mc_install timeout? Haven't hit that but usually working on high-bandwidth machines. I believe the standard thing would be to also download the corresponding checksum (e.g. see https://dl.min.io/client/mc/release/linux-amd64/) and confirm the checksum matches. (Could also be changed due to a man-in-middle attack, etc! Though with timeouts, the download should error if the downloaded size doesn't match the size in the http header I think?). Anyway CRAN would probably be happier if we verified checksums of binaries too.

  • Where did we get with this PR? It looks good but we've accumulated a lot of unrelated changes here I think? Is it possible to break this up a bit into multiple PRs?

cboettig avatar Sep 15 '23 19:09 cboettig