specs Geo "profile" for Data Package

Similar to Simple Data Format Data Package "profile" for tabular data - that is we just leverage the base datapackage.json but constraint the types of data resources you can ship.

Options for data formats:

geojson (most likely)
sqlite
geocsv ...
shapefile (probably not)

Could allow a couple of options but prefer to fix on one.

Dec 28 '13 21:12 rufuspollock

Things to store in a geo-enabled datapackage would typically be:

format of the file geojson/topojson/geocsv/kml/gml/geotiff/geojpeg
field(s) of type geometry (or 2 fields having lat-lon)
format of the geometry (json/wkt/wkb/gml/...)
projection (epsg:4326...)
bounds of the dataset
type of geometry (point/polyline/polygon/multipolygon/multiple -> this can probably be deduced from wkt, however you'll never know all fields will contain point, if top 5 have a point)

Note that sqlite potentially stores many tables, each table might require a datapackage.json (maybe have a look at geopackage specification, that introduces a metadata-table within the database; https://github.com/opengis/geopackage)

Note that some datasets will not be flat tables, managing a complex schema is probably managed in another issue

Feb 04 '14 15:02 pvgenuchten

@pvgenuchten really useful suggestions. I think our aim here should be to try and be as minimalistic as possible compatible with being useful to a reasonable number of people. Kind of 80/20 but even stronger.

So my question would be: what about essentially metadata do you need to e.g. import geojson usefully into something else. If the answer is none that would be great but i'm imagining the projection might be important. cf also here #81 (geo csv).

Feb 04 '14 18:02 rufuspollock

Strongly inclining to going with a recommendation of geojson and format geojson in the resource.

See also in progress recommendation at http://data.okfn.org/doc/publish-geodata

/cc @peterdesmet @jalbertbowden

Jun 07 '14 11:06 rufuspollock

Moving conversation regarding describing properties of a geo data package here. In reply to question by @rgrp regarding this:

could you advise what your use case is for describing the properties, for example will you be processing the data in some way that requires you to know the types of the property fields?

No, I don't have plans to process the data myself, I would just like to provide good metadata for the properties/fields, such as a description, or unit, or type. Example:

{
    "name": "code",
    "description": "Belgian traffic sign code.",
    "web": "http://wiki.openstreetmap.org/wiki/Road_signs_in_Belgium",
    "type": "string"
},

In the Tabular data format one can do this in "schema": { "fields": [] } of the datapackage.json, which I find very useful. Geojson is new to me: maybe it's possible to add metadata about the properties in the geojson file itself, but quite like it in the datapackage.json. I am trying to figure out the recommended way to do this.

Jun 08 '14 11:06 peterdesmet

@dr-shorthair any thoughts here about schemas for properties on features.

Sep 24 '15 14:09 rufuspollock

Coming late to this conversation. @pvgenuchten seems to have good handle on the issues.

Principle dilemma is that, while 2 columns (x,y) looks like obvious solution for points, it begs a lot of questions, particularly the key issue that coordinates are not independent of each and shouldn't be managed or processed independently. So a micro-syntax is preferred which binds them together (as is already done for time in the 8601/xsd 7-component string). Then the options are essentially GeoJSON or WKT. The former has the advantage of software support, but a significant limitation regarding non-2D geometries, and essentially non-existent support for coordinate reference systems. WKT is better on those issues, but is a very niche product! Both support various geometry types, labelled in the data. WKT allows a CRS to be referenced in the data. However, the GeoJSON CRS limitation may not be such a problem in this context, since you would only be using the GeoJSON geometry object so could carry the CRS reference separately, but then we could hit the coordinate-order issue*. Would also have to extend GeoJSON for solid geometries if required.

standard CRS definitions also prescribe the coordinate order. There is a historical convention, which is respected by the standard CRS definitions such as epsg:4326, that geographic coordinates are expressed in lat-lon order (i.e. y,x) while projected systems are generally (x,y). GeoJSON has a rule that, regardless of what the CRS says, the coordinate order is always (x,y). This may seem trivial, but there are many many examples of how things can go wrong because of mistaken assumptions.

Sep 25 '15 07:09 dr-shorthair

from @danfowler

weecology/retriever#797 @henrykironde

May 29 '17 11:05 pwalsh

I've started a guide on point data in CSVs. Your feedback is very welcome. It touches on some of the issues raised above (CRS, axis-order). Other geometry in CSV's makes less sense to me but happy to write about that also.

Edit: Now published Point location data in CSV files

Jul 05 '17 07:07 Stephen-Gates

The Spatial Data Package specification:

This proposal provides specifications for the Spatial Data Package. The proposed specifications are an extension of the Data package specification created by Frictionless Data. The current status of the Data package specification cover tabular data (Tabular Data Package). The Tabular Data Package provides a platform to standardize and organize data making sharing among tools and people effortless.

Relationship between a Tabular Data package and a spatial Data package

Unlike Spatial Data, Tabular data is simply text data separated by special delimiters(comma, tab and etc..) in a text file. Spatial data occurs in various forms of complex data structures often associated with the file extension.

Spatial data Categories

Spatial data is categorized into two groups, raster data and vector data. In the vector data model, geographical elements are represented using points, lines and polygons. Vector data captures and represents discrete objects with boundaries(Lakes, Rivers. roads and etc..).

The Raster data model is used to store data element using pixels or cells . The value of these cells captures the type of object or entity that is observed. A good example is a digital photograph, the pixels in the photo store a color that corresponds to the real world object at that point. Rasters can store discrete data, for example thematic information of land cover and continuous data for example chemical concentrations(Carbon Dioxide, Nitrates).

Vector Data Specifications

The specifications inherit the data package specifications like

Recommended Properties

name
id
licenses
profile

Optional Properties

title
description
homepage
version
sources
contributors
keywords
image
created

{
 #required
  "name": "name of the data",
  "title": "human readable label or title for the dataset",
  "gis_class": "Raster data or vector data",
  "file_type": "extension of format of the dataset",
  "description": "A good description for the dataset",
  "license": "A license",
  "keywords": ["rivers", "North America",], "keywords separated by comma" 
  "citation": "citation for the dataset",
  "spatial_ref": "Coordinate Reference System"
  "citation": "A good description for the dataset",
  "[path or url]":"path to the file"
  "resources": [
      #For each layer, give a name and the properties 
      #layer one
      { 
        "name": "Name for the layer eg.river",
        "Geometry_type": "point, linestring,....", "geometry_notation": 
        "NoDataValue": "what represents missing values",
        # define attribute data and type for each vector feature
        "schema": { 
          "fields": [
            {
              "name": "data name",
              "type": "data type"
            },
            {
              "name": "data name",
              "type": "data type"
            },
            {...}
          ],
        }
      },
      #layer two
      {....},
      #layer three
      {..}
}

Rasters

Like the vector data specifications, raster data specifications inherit the core components of the data package specifications. Rasters can have multiple nested datasets within a file, however the Json schema take on a similar structure like the vector data schema

The data package

Json schema example

{
    #required
    "name": "name of the data",
    "title": "human readable label or title for the dataset",
    "format": "extension of format of the dataset or  driver required",
    "file_size": "size of file on disk",
    "group_count": "Number of groups in the dataset if applicable"
    "dataset_count": "The number of individual datasets"
    "description": "A good description for the dataset",
    "license": "A license",
    "keywords": ["carbon map", "North America",], "keywords separated by comma" 
    "citation": "citation for the dataset",
    "version": "The version of the dataset"
    "homepage": "The home page of the data"
    "datum": "Coordinate Reference System",

  ""
  "[url or path]": "link to where the data is stored"
  #each band is defined
  "resources": [
    {
      "Group": "Name for the group if applicable",
      "name": "Name for the band",
      "relative_path": "Location relative to route path/url above",
      "resolution": "The resolution",
      "resolution_units": "The units of resolution",
      "dimensions": "dimensions",
      "noDataValue": "pixels where data is missing or no data collected",
      "geoTransform": "The transformation of the dataset",
      "parameter": "The parameter or feature",
      "extent": ["the extent values of the band"],
    },
    { ...},
  ]
}

Jul 31 '17 06:07 henrykironde

Thanks @Stephen-Gates for comments in #499. Could you transfer them to his issue.

Jul 31 '17 06:07 henrykironde

Thanks for this Henry.

I think a worked example using real data would help to clearly separate what's needed in a :

spatial data package - similar to tabular data package E.g.
- Each resource MUST be a Spatial Data Resource
- or could a mix of Spatial and Tabular data be in a package?
- should spatial and temporal extent be described at this level or for each resource?
spatial data resource - similar to tabular data resource. E.g.
- the spatial reference system must be included.
- the supported file types (GeoJSON, GML, etc)
- would a CSV with point data be a valid resource?
"layer schema" - similar to table schema

Thanks for starting the conversation.

Jul 31 '17 08:07 Stephen-Gates

@Stephen-Gates, Thanks for the suggestion, I will get some sample data to annotate as examples.

Aug 01 '17 16:08 henrykironde

This is being further developed, and feedback is very welcome in the issues, at https://github.com/cividi/spatial-data-package

Jan 21 '22 09:01 loleg

@loleg that's great ... could you provide a brief summary of state and plans here?

Jan 24 '22 10:01 rufuspollock

Hi @rufuspollock, thanks for checking in.

Very happy to get some feedback on https://github.com/cividi/spatial-data-package#detailed-data-package-structure. A proof of concept viewer is implemented in dfour, deployed for example for simple web publication of client projects with gemeindescan.ch, as a self publishing for events, like sandbox.dfour.space or [campusbochum.de] to public participation, like (https://beteiligung.campusbochum.de/de/SDY4F/0N2AQB/).

Pros

no dependency on a specific library or implementation -> independent of renderer, e.g. simple styles spec supported in many map libraries and tools (e.g. geojson.io, GitHub Previews, ...)
styles "baked in" -> curated snapshot, human readable, no interpretation needed

Cons

requires extra tooling to create styles: hard to update or change style, e.g. we wrote a special QGIS Plugin
style not declarative/rule based -> no support for complex style definitions (e.g. zoom based)
currently requires/only supports (inline) geojson -> no support for tabular data, e.g. CSV(T) or other frictionless compliant geo data

Potential options

Separate data and style definition, e.g. similar to Vega-Lite, but an abstraction of mapbox-gl styles
Vega-Lite geo

Feb 01 '22 21:02 n0rdlicht

specs specs copied to clipboard

Geo "profile" for Data Package

The Spatial Data Package specification:

Relationship between a Tabular Data package and a spatial Data package

Vector Data Specifications

The data package

Pros

Cons

Potential options

specs
specs copied to clipboard