repo2docker
repo2docker copied to clipboard
Adding support for OSF
We want to be able to support the immutable frozen relases on OSF as an input to repo2docker.
So we should introduce the concept of content providers. Their job is to check out the given URL to the current local directory. They won't do any caching or whatever, and would use traitlets to be pluggable.
We decided against having autodetection, since there is no way to do that reliably for everything, and doing that only for some providers seems complex.
So new commandline would be like:
jupyter-repo2docker --provider=osf https://<some-osf-url>
And the provider class should be like:
class OSFProvider(Provider):
name = 'osf'
description = 'some description of this provider'
@gen.coroutine
def provide(self, identifier):
"""
This method should fetch the contents of the given identifier to the current working directory
"""
pass
This means we only support a single identifier (such as file path, git url including ref, or osf url) to each provider. This makes implementation simpler, but we can modify this in the future if we need.
/cc @betatim @mfraezz
This will also need a supplemental issue in binderhub to support this on binder.
Think that looks good.
If we wanted to support more than a URL we could also consider that the provider gets to eat all the arguments, so:
jupyter-repo2docker --provider=osf https://<some-osf-url> blah foo bar
we would pass the whole sys.argvs as arguments to provide??
@betatim cool!
I think let's just go with providing a single identifier for now, and we can extend it when we run into a use case that explicitly requires it.
Related jupyterhub/binderhub#216
The schema.org JSON-LD can be used get a list of files associated with a Zenodo DOI w. This is what Google use for their dataset search so there is hope that this will become widely adopted because of their marketing power.
The problem with the JSON-LD is that it only gives you a list of files for a dataset, not a software archive.
However you can ask for information in JSON format (what ever that might be) and then you get a list of files both for datasets and for software:
curl -s -H "Accept: application/json" https://zenodo.org/api/records/1408168 | jq .filescurl -s -H "Accept: application/json" https://zenodo.org/api/records/1419226 | jq .files
Zenodo uses SoftwareSourceCode as the type of JSON-LD to represent a code submission. That schema doesn't contain a field for "archive link" so I think we won't have any luck with that.
My take away from this is that we should start with archive specific content providers and share as much as possible between them. A first step towards that is #242.
Once that is merged we could add a (remote) ZIP/tar.gz file provider which can then be reused by a Zenodo provider and/or an OSF provider.