odis-arch icon indicating copy to clipboard operation
odis-arch copied to clipboard

[WIS2] Provide sample mapping/GeoJSON to WMO

Open jmckenna opened this issue 1 year ago • 5 comments

Previous discussion notes

from @fils

...a quick version 0 for you to look at. The link that follows is a GeoJSON file. It is only for OBIS and is only the polygon geometries. If they have points or lines I will need to address them separately as converting from schema.org spatial to real geometry requires a converter for each. I knew there were many polygons, so I started with that. The sequence of the workflow was:

  • load the OBIS release graph from the OIH S3 object store
  • this is loaded into pyld as a graph and I do a SPARQL query with the results going into a pandas dataframe
  • convert the schema.org polygons to WKT
  • convert the pandas dataframe into a geopandas dataframe and in the process convert WKT strings to geopandas geometries
  • export geopandas to geojson

The export file is at https://github.com/iodepo/odis-arch/blob/master/archinterfaces/ODIS-WIS2/output/oih_obis_wmo.geojson it looks solid blue since there are so many polygons it just blankets the world. If I plot with alpha set very low I can get something like the following image:

image (1)

The properties block in the GeoJSON looks like:

"properties": {
        "s": "<https://obis.org/dataset/d64477cf-491f-4de5-8291-8c07986fa37e>",
        "name": "Canary Islands - OAG (aggregated per 1-degree cell)",
        "description": "Original ...",
        "geotype": "schema:GeoShape",
        "geompred": "schema:polygon",
        "geom": "-74.5 5.5,-74.5 45.5,32.5 45.5,32.5 5.5,-74.5 5.5",
        "WKT": "POLYGON ((-74.5 5.5, -74.5 45.5, 32.5 45.5, 32.5 5.5, -74.5 5.5))"
      },

Where s is the subject IRI from the graph. I will convert to from IRI to literal to remove the <> unless you want them. OK, this is just my first draft but I wanted to get it out to you sooner than later.

from @tomkralidis

Thanks @fils this looks great! Comments (based on analyzing a single GeoJSON feature):

  • I don’t need geotype, geom, geompred , or WKT, especially given the fact that you provide geometry in the payload, which is 1:1 with WIS2 WCMP2 metadata
  • I do need some sort of identifier for the record. I can derive that from s, but it would be safer to make it more explicit
  • I do need some sort of temporal property (of the data) in WCMP2. If this is not available I can make it null in WCMP2, so you can either emit it (with null as required), or, when null, do not emit which implies a null. I would prefer the former to be explicit
  • can keywords be provided? Even better, qualified by thesarus?
  • I do need a record creation date
  • we need to discuss data policy issues (WMO requires a data policy of “core” or “recommended”. Needs discussion at our next call

Great work here!

Related issue

https://github.com/iodepo/odis-arch/issues/238

jmckenna avatar Aug 15 '23 18:08 jmckenna

@tomkralidis some items for us to review tomorrow are in

https://github.com/iodepo/odis-arch/tree/schema-dev-df/archinterfaces/ODIS-WIS2

We can go over the output, but the GeoJSON to review is at: https://github.com/iodepo/odis-arch/blob/schema-dev-df/archinterfaces/ODIS-WIS2/output/oih_obis_wmo.geojson generated via https://github.com/iodepo/odis-arch/blob/schema-dev-df/archinterfaces/ODIS-WIS2/extraction_WMO.ipynb

I removed the unneeded columns and did a first crack and rolling up the SPARQL kewyrods into a single keyword parameter in the geojson. We can talk about whether that is the correct way to do it.

We've been re-organizing the repo layout to be a bit more logical, so sorry it's been breaking a few links.

fils avatar Aug 16 '23 21:08 fils

Thanks @fils / @jmckenna. Some additional comments based on a the first feature in the GeoJSON at https://github.com/iodepo/odis-arch/blob/schema-dev-df/archinterfaces/ODIS-WIS2/output/oih_obis_wmo.geojson

  • add an identifier as follows ($LOCAL_ID is defined by you)
"id": "urn:x-wmo:md:xxg:odis:$LOCAL_ID"
  • properties.keywords should be an array of keywords
  • rename properties.name to properties.title
  • move properties.temporal to time, as follows:
    "time": {
        "interval": [
            "1834",
            "2010"
        ]   
    }
  • add a properties.themes array, as follows:
        "themes": [
            {
                "concepts": [
                    {
                        "id": "ocean"
                    }
                ],
                "scheme": "https://github.com/wmo-im/wcmp2-codelists/blob/main/codelists/earth-system-discipline.csv"
            }
        ],
  • add a conformance property as follows:
    "conformsTo": [
        "http://wis.wmo.int/spec/wcmp/2/conf/core"
    ]
  • add a properties.contacts array, as follows per the below example:
        "contacts": [
            {
                "name": "National Inquiry Response Team",
                "organization": "Government of Canada; Environment and Climate Change Canada; Meteorological Service of Canada",
                "phones": [
                    {
                        "value": "+18199972800"
                    }
                ],
                "emails": [
                    {
                        "value": "[email protected]"
                    }
                ],
                "addresses": [
                    {
                        "deliveryPoint": [
                            "77 Westmorland Street, suite 260"
                        ],
                        "city": "Fredericton",
                        "administrativeArea": "NB",
                        "postalCode": "E3B 6Z4",
                        "country": "Canada"
                    }
                ],
                "links": [
                    {
                        "rel": "canonical",
                        "type": "text/html",
                        "href": "https://www.canada.ca/en/environment-climate-change.html"
                    }
                ],
                "roles": [
                    "producer"
                ]
            }
        ],
  • add properties.type, where possible values are per https://github.com/wmo-im/wcmp2-codelists/blob/main/codelists/resource-type.csv (Name column values)
  • add properties.created
  • add properties.wmo:dataPolicy (put recommended for now, while we sort out the details)
  • move properties.identifier to properties.externalIds, as follows:
 "externalIds": [{
    "scheme": "doi",
    "value": "https://doi.org/10.17031/2x2hau"
  }
  • move properties.s to links, as follows:
        "links": [{
            "rel": "related",
            "href": "https://obis.org/dataset/9afeee64-62f3-44b9-a2fb-794b2afcf50a",
            "type": "text/html",
            "title": "Full dataset information"
        }]

For reference:

  • draft specification: https://wmo-im.github.io/wcmp2/standard/wcmp2-DRAFT.html
  • examples: https://github.com/wmo-im/wcmp2/blob/main/examples

tomkralidis avatar Aug 16 '23 22:08 tomkralidis

@tomkralidis

Thanks for the detailed issue. I follow.

One issue I have in this workflow is that going to SPARQL to GeoJSON is what I am doing. I originally passed through GeoPandas since SPARQL -> Pandas -> GeoPandas -> GeoJSON was easy. However it is not very flexible.

I'm thinking about still leveraging SPARQL to Pandas as an easy way to go from query to data frame. However, I may simply jump straight from Pandas to GeoJSON via something like https://github.com/jazzband/geojson

Just looking for a nice Pythonic builder for GeoJSON. If you have any better tooling or library suggestions for programmatically build GeoJSON I am easily influenced. ;)

fils avatar Aug 16 '23 23:08 fils

Just looking for a nice Pythonic builder for GeoJSON. If you have any better tooling or library suggestions for programmatically build GeoJSON I am easily influenced. ;)

I guess there are many packages and approaches, and geojson or shapely can help with valid geometry representation. IMHO GeoJSON in Python parlance is a dictionary, and working directly with Python primitives and built-ins (i.e. json) is a super low barrier approach with a lean dependency chain for tooling.

tomkralidis avatar Aug 17 '23 01:08 tomkralidis

Note that yesterday we did some more repo cleaning-up, and the WIS2 workspace is now at /master/archinterfaces/ODIS-WIS2. The sample GeoJSON file lives in the output folder there

We've also agreed that all dev work will now happen in the master branch, so it should be easier to receive contributions/changes moving forward

jmckenna avatar Aug 17 '23 11:08 jmckenna