specs icon indicating copy to clipboard operation
specs copied to clipboard

Standardise method for referencing specific resource inside datapackage

Open akariv opened this issue 8 years ago • 4 comments

The goal here is to have a URI that could point to a specific resource inside a datapackage. We already have a standard way for identifying a datapackage: http://specs.frictionlessdata.io/data-package-identifier/.

This proposal suggests to add a means for referencing the data file contained inside the data package.

Implementation options (not mutually exclusive):

  • Using a JSON pointer notation: <datapackage-identifier>#/resources/<resource-index>/data Examples:
    • http://mywebsite.com/mydatapackage/datapackage.json#/resources/0/data
    • http://mywebsite.com/mydatapackage/#/resources/1/data
    • http://github.com/datasets/gold-prices#/resources/2/data
    • gold-prices#/resources/3/data
  • Using the resource name: <datapackage-identifier>#<resource-name> Since resources is an array, you can't reference a resource by its name - unless we start using stronger pointing mechanisms such as XPath (and we shouldn't...) Examples:
    • http://mywebsite.com/mydatapackage/datapackage.json#my-lovely-resource
    • http://mywebsite.com/mydatapackage/#finances-2012-q3
    • http://github.com/datasets/gold-prices#all-data
    • gold-prices#all-data

These two options are mutually exclusive as a resource name cannot start with a / (it's a slug)

Implementors might use this notation in the following ways:

  • For basic datapackages, this URI might redirect to the URL of the actual data file of the resource. If the data is inline, it should resolve to an application/json download of that part of the datapackage descriptor.
  • For tabular datapackages, a supporting library might return for this URI an iterator over all the rows in the data.

Thoughts: This is important since right now there's no way to provide a stable link to a specific data file inside a datapackage. This led me to think whether we wanted to provide a means for having a stable link to a specific row inside a tabular datapackage? Perhaps even a specific field? This is also important, as (for example), in case you wanted to substantiate a specific claim ('The budget for NHS was £350M in 2016') you could have a single URI pointing to that specific number.

akariv avatar Feb 15 '17 08:02 akariv

@rufuspollock I prefer the JSON Pointer solution to this issue, perhaps obviously, and this is a great example of where we can reuse existing spec work inside of developing our own referencing mechanisms.

Also note that, as JSON Pointer is simply a syntax for traversal of nested objects, it can also be directly used to implement the final "thought" that @akariv has added in the issue, as a means to reference a specific cell of data.

All in all, I'm +1 on supporting this type of thing, and I can also see how it plays into features we will want to add to OpenSpending for referring data facts directly ( see https://github.com/openspending/openspending/issues/1186 )

I think I'd be keen to target this for v1.1, however, note that this will just reintroduce JSON Pointer again, so we really need to know where you would stand on this @rufuspollock

@akariv @roll further comments from you appreciated.

pwalsh avatar May 29 '17 07:05 pwalsh

I'm really interested in this issue. I suggest that it become a pattern asap with potential for v1.1.

Re JSON pointer question: two concerns here:

  • My general concern that JSON Pointer brings a lot of unnecessary baggage both conceptually and in terms of lib dependency. In most cases I feel it is a case where we are overgeneralizing: "we could solve this specific issue and lots of other stuff too" (our coder tendencies!)
  • http://mywebsite.com/mydatapackage/datapackage.json#/resources/0/data just seems less nice than http://mywebsite.com/mydatapackage/datapackage.json#my-lovely-resource

The interesting question would be how we could reference a cell. There is this existing CSV fragment identifier stuff that I always quite liked and you could reuse for tabular data in general https://tools.ietf.org/html/rfc7111

rufuspollock avatar Jun 14 '17 08:06 rufuspollock

Do you have some actual use-cases for this? While it looks like it makes sense I don't fully understand where / why it would be used.

I understand the need to reference a specific tabular item - but how will it be used? In what scenario will a user input a url to a specific resource / item and what will happen / with what tool will he open this link?

OriHoch avatar Jun 18 '17 09:06 OriHoch

e.g. with tabulator, you might want to open a single table - resource from a data package. In a data package registry, you would want to point to a specific data set in a big data package etc.

akariv avatar Jun 20 '17 07:06 akariv