specs
specs copied to clipboard
Geo "profile" for Data Package
Similar to Simple Data Format Data Package "profile" for tabular data - that is we just leverage the base datapackage.json but constraint the types of data resources you can ship.
Options for data formats:
- geojson (most likely)
- sqlite
- geocsv ...
- shapefile (probably not)
Could allow a couple of options but prefer to fix on one.
Things to store in a geo-enabled datapackage would typically be:
- format of the file geojson/topojson/geocsv/kml/gml/geotiff/geojpeg
- field(s) of type geometry (or 2 fields having lat-lon)
- format of the geometry (json/wkt/wkb/gml/...)
- projection (epsg:4326...)
- bounds of the dataset
- type of geometry (point/polyline/polygon/multipolygon/multiple -> this can probably be deduced from wkt, however you'll never know all fields will contain point, if top 5 have a point)
Note that sqlite potentially stores many tables, each table might require a datapackage.json (maybe have a look at geopackage specification, that introduces a metadata-table within the database; https://github.com/opengis/geopackage)
Note that some datasets will not be flat tables, managing a complex schema is probably managed in another issue
@pvgenuchten really useful suggestions. I think our aim here should be to try and be as minimalistic as possible compatible with being useful to a reasonable number of people. Kind of 80/20 but even stronger.
So my question would be: what about essentially metadata do you need to e.g. import geojson usefully into something else. If the answer is none that would be great but i'm imagining the projection might be important. cf also here #81 (geo csv).
Strongly inclining to going with a recommendation of geojson and format geojson in the resource.
See also in progress recommendation at http://data.okfn.org/doc/publish-geodata
/cc @peterdesmet @jalbertbowden
Moving conversation regarding describing properties of a geo data package here. In reply to question by @rgrp regarding this:
could you advise what your use case is for describing the properties, for example will you be processing the data in some way that requires you to know the types of the property fields?
No, I don't have plans to process the data myself, I would just like to provide good metadata for the properties/fields, such as a description
, or unit
, or type
. Example:
{
"name": "code",
"description": "Belgian traffic sign code.",
"web": "http://wiki.openstreetmap.org/wiki/Road_signs_in_Belgium",
"type": "string"
},
In the Tabular data format one can do this in "schema": { "fields": [] }
of the datapackage.json
, which I find very useful. Geojson is new to me: maybe it's possible to add metadata about the properties in the geojson file itself, but quite like it in the datapackage.json
. I am trying to figure out the recommended way to do this.
@dr-shorthair any thoughts here about schemas for properties on features.
Coming late to this conversation. @pvgenuchten seems to have good handle on the issues.
Principle dilemma is that, while 2 columns (x,y) looks like obvious solution for points, it begs a lot of questions, particularly the key issue that coordinates are not independent of each and shouldn't be managed or processed independently. So a micro-syntax is preferred which binds them together (as is already done for time in the 8601/xsd 7-component string). Then the options are essentially GeoJSON or WKT. The former has the advantage of software support, but a significant limitation regarding non-2D geometries, and essentially non-existent support for coordinate reference systems. WKT is better on those issues, but is a very niche product! Both support various geometry types, labelled in the data. WKT allows a CRS to be referenced in the data. However, the GeoJSON CRS limitation may not be such a problem in this context, since you would only be using the GeoJSON geometry object so could carry the CRS reference separately, but then we could hit the coordinate-order issue*. Would also have to extend GeoJSON for solid geometries if required.
- standard CRS definitions also prescribe the coordinate order. There is a historical convention, which is respected by the standard CRS definitions such as epsg:4326, that geographic coordinates are expressed in lat-lon order (i.e. y,x) while projected systems are generally (x,y). GeoJSON has a rule that, regardless of what the CRS says, the coordinate order is always (x,y). This may seem trivial, but there are many many examples of how things can go wrong because of mistaken assumptions.
from @danfowler
weecology/retriever#797 @henrykironde
I've started a guide on point data in CSVs. Your feedback is very welcome. It touches on some of the issues raised above (CRS, axis-order). Other geometry in CSV's makes less sense to me but happy to write about that also.
Edit: Now published Point location data in CSV files
The Spatial Data Package specification:
This proposal provides specifications for the Spatial Data Package. The proposed specifications are an extension of the Data package specification created by Frictionless Data. The current status of the Data package specification cover tabular data (Tabular Data Package). The Tabular Data Package provides a platform to standardize and organize data making sharing among tools and people effortless.
Relationship between a Tabular Data package and a spatial Data package
Unlike Spatial Data, Tabular data is simply text data separated by special delimiters(comma, tab and etc..) in a text file. Spatial data occurs in various forms of complex data structures often associated with the file extension.
Spatial data Categories
Spatial data is categorized into two groups, raster data and vector data. In the vector data model, geographical elements are represented using points, lines and polygons. Vector data captures and represents discrete objects with boundaries(Lakes, Rivers. roads and etc..).
The Raster data model is used to store data element using pixels or cells . The value of these cells captures the type of object or entity that is observed. A good example is a digital photograph, the pixels in the photo store a color that corresponds to the real world object at that point. Rasters can store discrete data, for example thematic information of land cover and continuous data for example chemical concentrations(Carbon Dioxide, Nitrates).
Vector Data Specifications
The specifications inherit the data package specifications like
Recommended Properties
- name
- id
- licenses
- profile
Optional Properties
- title
- description
- homepage
- version
- sources
- contributors
- keywords
- image
- created
{
#required
"name": "name of the data",
"title": "human readable label or title for the dataset",
"gis_class": "Raster data or vector data",
"file_type": "extension of format of the dataset",
"description": "A good description for the dataset",
"license": "A license",
"keywords": ["rivers", "North America",], "keywords separated by comma"
"citation": "citation for the dataset",
"spatial_ref": "Coordinate Reference System"
"citation": "A good description for the dataset",
"[path or url]":"path to the file"
"resources": [
#For each layer, give a name and the properties
#layer one
{
"name": "Name for the layer eg.river",
"Geometry_type": "point, linestring,....", "geometry_notation":
"NoDataValue": "what represents missing values",
# define attribute data and type for each vector feature
"schema": {
"fields": [
{
"name": "data name",
"type": "data type"
},
{
"name": "data name",
"type": "data type"
},
{...}
],
}
},
#layer two
{....},
#layer three
{..}
}
Rasters
Like the vector data specifications, raster data specifications inherit the core components of the data package specifications. Rasters can have multiple nested datasets within a file, however the Json schema take on a similar structure like the vector data schema
The data package
Json schema example
{
#required
"name": "name of the data",
"title": "human readable label or title for the dataset",
"format": "extension of format of the dataset or driver required",
"file_size": "size of file on disk",
"group_count": "Number of groups in the dataset if applicable"
"dataset_count": "The number of individual datasets"
"description": "A good description for the dataset",
"license": "A license",
"keywords": ["carbon map", "North America",], "keywords separated by comma"
"citation": "citation for the dataset",
"version": "The version of the dataset"
"homepage": "The home page of the data"
"datum": "Coordinate Reference System",
""
"[url or path]": "link to where the data is stored"
#each band is defined
"resources": [
{
"Group": "Name for the group if applicable",
"name": "Name for the band",
"relative_path": "Location relative to route path/url above",
"resolution": "The resolution",
"resolution_units": "The units of resolution",
"dimensions": "dimensions",
"noDataValue": "pixels where data is missing or no data collected",
"geoTransform": "The transformation of the dataset",
"parameter": "The parameter or feature",
"extent": ["the extent values of the band"],
},
{ ...},
]
}
Thanks @Stephen-Gates for comments in #499. Could you transfer them to his issue.
Thanks for this Henry.
I think a worked example using real data would help to clearly separate what's needed in a :
-
spatial data package - similar to tabular data package E.g.
- Each resource MUST be a Spatial Data Resource
- or could a mix of Spatial and Tabular data be in a package?
- should spatial and temporal extent be described at this level or for each resource?
-
spatial data resource - similar to tabular data resource. E.g.
- the spatial reference system must be included.
- the supported file types (GeoJSON, GML, etc)
- would a CSV with point data be a valid resource?
- "layer schema" - similar to table schema
Thanks for starting the conversation.
@Stephen-Gates, Thanks for the suggestion, I will get some sample data to annotate as examples.
This is being further developed, and feedback is very welcome in the issues, at https://github.com/cividi/spatial-data-package
@loleg that's great ... could you provide a brief summary of state and plans here?
Hi @rufuspollock, thanks for checking in.
Very happy to get some feedback on https://github.com/cividi/spatial-data-package#detailed-data-package-structure. A proof of concept viewer is implemented in dfour, deployed for example for simple web publication of client projects with gemeindescan.ch, as a self publishing for events, like sandbox.dfour.space or [campusbochum.de] to public participation, like (https://beteiligung.campusbochum.de/de/SDY4F/0N2AQB/).
Pros
- no dependency on a specific library or implementation -> independent of renderer, e.g. simple styles spec supported in many map libraries and tools (e.g. geojson.io, GitHub Previews, ...)
- styles "baked in" -> curated snapshot, human readable, no interpretation needed
Cons
- requires extra tooling to create styles: hard to update or change style, e.g. we wrote a special QGIS Plugin
- style not declarative/rule based -> no support for complex style definitions (e.g. zoom based)
- currently requires/only supports (inline) geojson -> no support for tabular data, e.g. CSV(T) or other frictionless compliant geo data
Potential options
- Separate data and style definition, e.g. similar to
Vega-Lite
, but an abstraction of mapbox-gl styles -
Vega-Lite
geo