pins-r icon indicating copy to clipboard operation
pins-r copied to clipboard

board_url() can't download pins from boards created with board_folder()

Open innir opened this issue 3 years ago • 9 comments

Hi,

looking at the the board_url() documentation

A named character vector of URLs If the URL ends in a /, board_url will look for a data.txt that provides metadata. The easiest way to generate this file is to upload a pin directory created by board_folder().

it seems to be easy to create a board locally and write a pin to it with

> library(pins)
> board <- board_folder("test")
> board %>% pin_write(mtcars)
Using `name = 'mtcars'`
Guessing `type = 'rds'`
Creating new version '20220712T143218Z-66143'
Writing to pin 'mtcars'
> board$versioned
[1] FALSE
> fs::dir_tree("test")
test
└── mtcars
    └── 20220712T143218Z-66143
        ├── data.txt
        └── mtcars.rds

Now one should be able to just copy the folder test to some http server and get the pin mtcars back like

board <- board_url(c(mtcars = "https://example.com/test/mtcars/"))
board %>% pin_read("mtcars")

but the directory tree and the code at https://github.com/rstudio/pins/blob/main/R/board_url.R#L84-L90 makes it obvious that this will (and does) not work. The code expects data.txt at https://example.com/test/mtcars/data.txt while it is at https://example.com/test/mtcars/20220712T143218Z-66143/data.txt.

innir avatar Jul 12 '22 14:07 innir

Hey--I wonder if this could be clarified in the documentation, but from what I can tell this phrase...

upload a pin directory created by board_folder(). [emph added]

Should not be referring to the board itself, or even a pin directory, but a pin version directory. If you were to paste that version folder ('20220712T143218Z-66143') somewhere, then you could use board_url to refer to it.

I'm not sure there's a simple way to support broader board functionality using board_url, without a way to list folders (which is not impossible with HTTP, but also dependent on HTTP servers doing things in a consistent way.)

Could you say more about the case you're trying to use it for?

machow avatar Jul 12 '22 16:07 machow

True, one could use the pin version but that would draw it rather useless, as every update of the pin would force the reader of that pin to change the name :-/ For me the question it more, why does board_folder("test", versioned = FALSE) add a versioned directory?

My use-case is basically a poor-mans api ... I frequently update pins on a server and want to consume the data on clients ... seemed to be a very easy solution involving just a http server serving static files ... (and some cron jon updating the pins)

innir avatar Jul 12 '22 17:07 innir

I came here on the same path as @innir: I want to create using board_folder(), then serve using board_url() with a predictable, unchanging URL.

When I create a board_folder(), versioned = FALSE is the default (as you know). To me, this suggests that the data.txt and the supporting data files would be stored in the directory named for the pin, rather than in a subdirectory named for a version.

In essence, like the way the examples are set up for board_url():

github_raw <- "https://raw.githubusercontent.com/"
board <- board_url(c(
  files = paste0(github_raw, "rstudio/pins/master/tests/testthat/pin-files/"),
  rds = paste0(github_raw, "rstudio/pins/master/tests/testthat/pin-rds/"),
  raw = paste0(github_raw, "rstudio/pins/master/tests/testthat/pin-files/first.txt")
))

Thanks!

ijlyttle avatar Aug 05 '22 22:08 ijlyttle

This may be a bad (and unrelated) idea, but could board_url() support a manifest file?

That is, if you intend to serve a board from a web-server, you could call a function to generate a file in its root called something like manifest.json (or whatever), which would contain the names of all the pins and versions. Then, if you called board_url() ~without arguments~ with a single unnamed argument: the base url, it would look for this manifest file to build the board?

¯\_(ツ)_/¯

ijlyttle avatar Aug 05 '22 22:08 ijlyttle

I have made a proof-of-concept: https://ijlyttle.github.io/pinsManifest/

This implements the manifest idea, using a file called pins.txt. It can create a manifest only for board_folder() and can be used to create (read from) a board_url().

I wrote this package in hopes that the ideas could be integrated into pins itself (also pins for Python (in the future, maybe also JS?)). In that spirit, I'd be happy to create a new issue and contribute to a PR.

Thanks!

ijlyttle avatar Aug 07 '22 15:08 ijlyttle

Ah thanks for this incredible prototype--It seems like a really useful feature!

It seems like a tricky piece here is adding useful things while keeping a narrow scope in pins. Right now, pins handles..

  • reading / writing to backends that act like filesystems (e.g. S3, a local folder). Writing a pin means you can list / read it.
  • reading pins in a limited way from board_url().

I wonder if one way to make board_url() to be more useful--while keeping pins' scope narrow--could be..

  • add an argument to board_url to tell it to use a yaml manifest (like in your demo)
  • explain what the structure of a manifest should be in the docs
  • leave to the user the creation of the manifest

The big advantage AFAICT of the manifest is that users reading from a board could use just the http address, while the person curating the board could update the manifest to add pins, etc..

WDYT? (cc @juliasilge )

machow avatar Aug 08 '22 14:08 machow

Thanks @machow!

I think you summed up exactly what the manifest aims to do, and I can appreciate keeping a narrow scope. Mindful that this is not my place, I think a manifest file could make board_url() essentially a read-only version of other boards:

  • could it support versioning?
  • if the file structure is the same for, say, board_s3(), could you add a manifest file, then make the bucket public to serve it for board_url()?

I agree that the creation of a manifest has to be the responsibility of the curator, along the lines of invoking renv::snapshot() as needed.

I have been playing around a bit with the python version; board_urls() seems to take a step in this direction, where you provide a base URL, path, then a pins_path, which is not-too-different from the proposed manifest (of course you know all this, you wrote it!)

Apologies for excitedly throwing out wild possibilities - I am starting to appreciate the potential impact of pins, especially across languages, and 🤯

ijlyttle avatar Aug 08 '22 15:08 ijlyttle

That prototype looks so amazing @ijlyttle! 🙌

If you are up for making a somewhat speculative PR to implement something like this, would you do one that adds a manifest argument to board_url() and we can see how this plays out? I think this is really promising and could work well; having real code to look at and play around with could help us move forward.

juliasilge avatar Aug 19 '22 00:08 juliasilge

Thanks, and yes!

Assuming things work out for this PR (and for #631), it would be great to have them in Python, too (I'm sure you're thinking along those lines, as well).

I don't have a lot pf experience in Python, but to the extent that @machaow would tolerate me "getting in the way", I'd be happy to "help" there, too.

ijlyttle avatar Aug 19 '22 11:08 ijlyttle

Check out how this works now! Perhaps the easiest way to see this is this test:

https://github.com/rstudio/pins-r/blob/fd1708fd959d913c5aa2133353939e6ed484b852/tests/testthat/test-board_url.R#L72-L89

We also will have a new vignette outlining this approach, as described in #685.

We would love it if you tried this out (install via remotes::install_github("rstudio/pins-r")) and gave us any feedback before we send this to CRAN!

juliasilge avatar Dec 16 '22 01:12 juliasilge

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

github-actions[bot] avatar Dec 31 '22 00:12 github-actions[bot]