dataspice icon indicating copy to clipboard operation
dataspice copied to clipboard

Find a way to make `contentURL` in `access.csv` be automatic

Open amoeba opened this issue 6 years ago • 2 comments

I'm not entirely sure this is trivial but I hope it is:

When access.csv gets filled in automatically, the contentURL field is blank. The actual URL to the file would follow the GitHub convention for serving raw files over HTTP:

https://raw.githubusercontent.com/{user|or}/{repo}/{branch}/path/to/file.ext

I think we know this or can find this out before we create the HTML. Take a shot at it and report back! Perhaps the git2r does this in a nice way.

amoeba avatar May 25 '18 03:05 amoeba

Perhaps not much of a concern for us at this time, but this does assume that the data file is committed to git, which is often not ideal and will fail for data > 50Mb.

I've been exploring various ways around this (e.g. see unconf issue https://github.com/ropensci/unconf18/issues/51). My current strategy is to take a clue from Rich FitzJohn and upload the data as assets attached to a release. piggyback will let you do something like:

library(piggyback)
pb_upload("user/repo", tag= "data", file = "mydata.csv.gz")
url <- pb_download_url("user/repo", tag= "data", file = "mydata.csv.gz")

to construct a download url for the asset.

This should work for any individual data files up to 2GB in size. I know this mid-size range of > 50mb but < 2 GB isn't huge, so may not be particularly useful for most people, but it is an easy way to avoid cluttering up a git repo.

Of course ideally the data would eventually be uploaded to a DOI-providing repository and contentURL would be amended to that anyway.

cboettig avatar May 29 '18 20:05 cboettig

Whoa, piggyback is cool. This looks like a nice way to get a file shareable fast considering the user already has a repo on GitHub. Does Zenono archive release assets like the ones piggyback creates?

Part of me likes the idea of scoping this package to "data checked into git" but that might be just to simplify things for me rather than a user.

Stepping back, I had thought about how we'd support users making use of non-local files in their scripts. There's nothing in our metadata generation process that prohibits a user from filling in more rows in the access.csv but it'd be nice to automate this. Can you think of any other patterns we could leverage to automatically fill in rows in access.csv (and attributes.csv too for that matter) when the user wants to document more than files checked in under ./data?

amoeba avatar May 29 '18 22:05 amoeba