odis-arch
odis-arch copied to clipboard
[WIS2] Provide sample mapping/GeoJSON to WMO
Previous discussion notes
from @fils
...a quick version 0 for you to look at. The link that follows is a GeoJSON file. It is only for OBIS and is only the polygon geometries. If they have points or lines I will need to address them separately as converting from schema.org spatial to real geometry requires a converter for each. I knew there were many polygons, so I started with that. The sequence of the workflow was:
- load the OBIS release graph from the OIH S3 object store
- this is loaded into pyld as a graph and I do a SPARQL query with the results going into a pandas dataframe
- convert the schema.org polygons to WKT
- convert the pandas dataframe into a geopandas dataframe and in the process convert WKT strings to geopandas geometries
- export geopandas to geojson
The export file is at https://github.com/iodepo/odis-arch/blob/master/archinterfaces/ODIS-WIS2/output/oih_obis_wmo.geojson it looks solid blue since there are so many polygons it just blankets the world. If I plot with alpha set very low I can get something like the following image:
The properties block in the GeoJSON looks like:
"properties": {
"s": "<https://obis.org/dataset/d64477cf-491f-4de5-8291-8c07986fa37e>",
"name": "Canary Islands - OAG (aggregated per 1-degree cell)",
"description": "Original ...",
"geotype": "schema:GeoShape",
"geompred": "schema:polygon",
"geom": "-74.5 5.5,-74.5 45.5,32.5 45.5,32.5 5.5,-74.5 5.5",
"WKT": "POLYGON ((-74.5 5.5, -74.5 45.5, 32.5 45.5, 32.5 5.5, -74.5 5.5))"
},
Where s
is the subject IRI from the graph. I will convert to from IRI to literal to remove the <>
unless you want them.
OK, this is just my first draft but I wanted to get it out to you sooner than later.
from @tomkralidis
Thanks @fils this looks great! Comments (based on analyzing a single GeoJSON feature):
- I don’t need
geotype
,geom
,geompred
, orWKT
, especially given the fact that you providegeometry
in the payload, which is 1:1 with WIS2 WCMP2 metadata - I do need some sort of identifier for the record. I can derive that from s, but it would be safer to make it more explicit
- I do need some sort of temporal property (of the data) in WCMP2. If this is not available I can make it null in WCMP2, so you can either emit it (with null as required), or, when null, do not emit which implies a null. I would prefer the former to be explicit
- can keywords be provided? Even better, qualified by thesarus?
- I do need a record creation date
- we need to discuss data policy issues (WMO requires a data policy of “core” or “recommended”. Needs discussion at our next call
Great work here!
Related issue
https://github.com/iodepo/odis-arch/issues/238
@tomkralidis some items for us to review tomorrow are in
https://github.com/iodepo/odis-arch/tree/schema-dev-df/archinterfaces/ODIS-WIS2
We can go over the output, but the GeoJSON to review is at: https://github.com/iodepo/odis-arch/blob/schema-dev-df/archinterfaces/ODIS-WIS2/output/oih_obis_wmo.geojson generated via https://github.com/iodepo/odis-arch/blob/schema-dev-df/archinterfaces/ODIS-WIS2/extraction_WMO.ipynb
I removed the unneeded columns and did a first crack and rolling up the SPARQL kewyrods into a single keyword parameter in the geojson. We can talk about whether that is the correct way to do it.
We've been re-organizing the repo layout to be a bit more logical, so sorry it's been breaking a few links.
Thanks @fils / @jmckenna. Some additional comments based on a the first feature in the GeoJSON at https://github.com/iodepo/odis-arch/blob/schema-dev-df/archinterfaces/ODIS-WIS2/output/oih_obis_wmo.geojson
- add an identifier as follows (
$LOCAL_ID
is defined by you)
"id": "urn:x-wmo:md:xxg:odis:$LOCAL_ID"
-
properties.keywords
should be an array of keywords - rename
properties.name
toproperties.title
- move
properties.temporal
totime
, as follows:
"time": {
"interval": [
"1834",
"2010"
]
}
- add a
properties.themes
array, as follows:
"themes": [
{
"concepts": [
{
"id": "ocean"
}
],
"scheme": "https://github.com/wmo-im/wcmp2-codelists/blob/main/codelists/earth-system-discipline.csv"
}
],
- add a conformance property as follows:
"conformsTo": [
"http://wis.wmo.int/spec/wcmp/2/conf/core"
]
- add a
properties.contacts
array, as follows per the below example:
"contacts": [
{
"name": "National Inquiry Response Team",
"organization": "Government of Canada; Environment and Climate Change Canada; Meteorological Service of Canada",
"phones": [
{
"value": "+18199972800"
}
],
"emails": [
{
"value": "[email protected]"
}
],
"addresses": [
{
"deliveryPoint": [
"77 Westmorland Street, suite 260"
],
"city": "Fredericton",
"administrativeArea": "NB",
"postalCode": "E3B 6Z4",
"country": "Canada"
}
],
"links": [
{
"rel": "canonical",
"type": "text/html",
"href": "https://www.canada.ca/en/environment-climate-change.html"
}
],
"roles": [
"producer"
]
}
],
- add
properties.type
, where possible values are per https://github.com/wmo-im/wcmp2-codelists/blob/main/codelists/resource-type.csv (Name
column values) - add
properties.created
- add
properties.wmo:dataPolicy
(putrecommended
for now, while we sort out the details) - move
properties.identifier
toproperties.externalIds
, as follows:
"externalIds": [{
"scheme": "doi",
"value": "https://doi.org/10.17031/2x2hau"
}
- move
properties.s
tolinks
, as follows:
"links": [{
"rel": "related",
"href": "https://obis.org/dataset/9afeee64-62f3-44b9-a2fb-794b2afcf50a",
"type": "text/html",
"title": "Full dataset information"
}]
For reference:
- draft specification: https://wmo-im.github.io/wcmp2/standard/wcmp2-DRAFT.html
- examples: https://github.com/wmo-im/wcmp2/blob/main/examples
@tomkralidis
Thanks for the detailed issue. I follow.
One issue I have in this workflow is that going to SPARQL to GeoJSON is what I am doing. I originally passed through GeoPandas since SPARQL -> Pandas -> GeoPandas -> GeoJSON was easy. However it is not very flexible.
I'm thinking about still leveraging SPARQL to Pandas as an easy way to go from query to data frame. However, I may simply jump straight from Pandas to GeoJSON via something like https://github.com/jazzband/geojson
Just looking for a nice Pythonic builder for GeoJSON. If you have any better tooling or library suggestions for programmatically build GeoJSON I am easily influenced. ;)
Just looking for a nice Pythonic builder for GeoJSON. If you have any better tooling or library suggestions for programmatically build GeoJSON I am easily influenced. ;)
I guess there are many packages and approaches, and geojson
or shapely
can help with valid geometry representation. IMHO GeoJSON in Python parlance is a dictionary, and working directly with Python primitives and built-ins (i.e. json
) is a super low barrier approach with a lean dependency chain for tooling.
Note that yesterday we did some more repo cleaning-up, and the WIS2 workspace is now at /master/archinterfaces/ODIS-WIS2. The sample GeoJSON file lives in the output
folder there
We've also agreed that all dev work will now happen in the master
branch, so it should be easier to receive contributions/changes moving forward