repo2docker Adding support for OSF

We want to be able to support the immutable frozen relases on OSF as an input to repo2docker.

So we should introduce the concept of content providers. Their job is to check out the given URL to the current local directory. They won't do any caching or whatever, and would use traitlets to be pluggable.

We decided against having autodetection, since there is no way to do that reliably for everything, and doing that only for some providers seems complex.

So new commandline would be like:

jupyter-repo2docker --provider=osf https://<some-osf-url>

And the provider class should be like:

class OSFProvider(Provider):
     name = 'osf'
     
     description = 'some description of this provider'

     @gen.coroutine
     def provide(self, identifier):
     """
     This method should fetch the contents of the given identifier to the current working directory
     """
     pass

This means we only support a single identifier (such as file path, git url including ref, or osf url) to each provider. This makes implementation simpler, but we can modify this in the future if we need.

/cc @betatim @mfraezz

Oct 20 '17 17:10 yuvipanda

This will also need a supplemental issue in binderhub to support this on binder.

Oct 20 '17 17:10 yuvipanda

Think that looks good.

If we wanted to support more than a URL we could also consider that the provider gets to eat all the arguments, so:

jupyter-repo2docker --provider=osf https://<some-osf-url> blah foo bar

we would pass the whole sys.argvs as arguments to provide??

Oct 20 '17 18:10 betatim

@betatim cool!

I think let's just go with providing a single identifier for now, and we can extend it when we run into a use case that explicitly requires it.

Oct 21 '17 07:10 yuvipanda

Related jupyterhub/binderhub#216

The schema.org JSON-LD can be used get a list of files associated with a Zenodo DOI w. This is what Google use for their dataset search so there is hope that this will become widely adopted because of their marketing power.

Oct 05 '18 15:10 betatim

The problem with the JSON-LD is that it only gives you a list of files for a dataset, not a software archive.

However you can ask for information in JSON format (what ever that might be) and then you get a list of files both for datasets and for software:

curl -s -H "Accept: application/json" https://zenodo.org/api/records/1408168 | jq .files
curl -s -H "Accept: application/json" https://zenodo.org/api/records/1419226 | jq .files

Oct 05 '18 17:10 betatim

Zenodo uses SoftwareSourceCode as the type of JSON-LD to represent a code submission. That schema doesn't contain a field for "archive link" so I think we won't have any luck with that.

My take away from this is that we should start with archive specific content providers and share as much as possible between them. A first step towards that is #242.

Once that is merged we could add a (remote) ZIP/tar.gz file provider which can then be reused by a Zenodo provider and/or an OSF provider.

Oct 15 '18 14:10 betatim

repo2docker repo2docker copied to clipboard

Adding support for OSF

repo2docker
repo2docker copied to clipboard