odis-arch icon indicating copy to clipboard operation
odis-arch copied to clipboard

EMODNet: validate & index to graph (the new unified EMODNet service)

Open jmckenna opened this issue 2 years ago • 22 comments

  • EMODNet & VLIZ services were recently unified into 1 service
  • sources.yaml in the schema-dev branch now has the updated sitemap and service endpoints for emodnet
  • sitemapCheck returns VALID : 2207 unique resources in 1 sitemap URL
  • JSON-LD is embedded in each record's page, and validates through schema.org validator
    • response of validator with one record: https://validator.schema.org/#url=https%3A%2F%2Femodnet.ec.europa.eu%2Fgeonetwork%2Fsrv%2Fapi%2Frecords%2Fbf91a19e5d3f2e29f0426eab8766a6bec350561c
  • EMODNet team is interested in how the new service indexes with ODIS & awaits our feedback

cc @tim-collart

jmckenna avatar Dec 08 '22 16:12 jmckenna

I processed this with Gleaner. The results are:

  SitemapCount: 2208
   SitemapStored: 1210
   SitemapIssues: 971

A review of some of the errors show many are like the following.

see also: https://tinyurl.com/y9zxpzqv for the graph in the JSON-LD playground.

ISSUE#1

The description parameter has a string literal value that include quotation marks. This messes up the parsing of the JSON-LD. These need to be removed or escaped out.

For example, at https://emodnet.ec.europa.eu/geonetwork/srv/api/records/f895d2e2-1434-4118-8f4d-4351e3f63beb you can view the source and the JSON-LD. You will find a section like:

        {
        "@type":"DataDownload",
        "contentUrl":"https://www.emodnet-seabedhabitats.eu/access-data/launch-map-viewer/?zoom=8&center=7.67098,53.65892&layerIds=521&baseLayerId=1&activeFilters=NobwRANghgngpgJwJIBMwC4CsAmAjAGjADMBLCAF0VQ0wBZDSLEAZAe1YGsBXAB1QGcMwALoMylBABU4AD3IYwAEQCiABlUBmVbgBsYAL7CgA",
        "encodingFormat":"WWW:LINK-1.0-http--link",
        "name":"EMODnet Seabed Habitats Map Viewer",
        "description":"View map "DE003016" on the EMODnet Seabed Habitats Map Viewer"
        }

ISSUE#2

Related issue on return characters:

 invalid character '\\n' in string literal

fils avatar Dec 09 '22 13:12 fils

Here is the schema.org Validator, clearly breaking on that description : https://validator.schema.org/#url=https%3A%2F%2Femodnet.ec.europa.eu%2Fgeonetwork%2Fsrv%2Fapi%2Frecords%2Ff895d2e2-1434-4118-8f4d-4351e3f63beb

Screenshot 2022-12-12 091536

jmckenna avatar Dec 12 '22 13:12 jmckenna

This is still in EMODNet's court right? Those errors are hard blockers.

pbuttigieg avatar Dec 17 '22 17:12 pbuttigieg

The double quotes are now escaped by applying https://github.com/geonetwork/core-geonetwork/commit/50986d1823fff9bc2eb2b895be72a5d92e5875ac

bart-v avatar Aug 11 '23 16:08 bart-v

thanks @bart-v I will test a fresh harvest from your unified service, and index into ODIS...

jmckenna avatar Aug 11 '23 17:08 jmckenna

@bart-v additional feedback after re-harvesting the unified service:

  • schema.org validator fails on "Duplicate key found" (see email property below):
    "author": [
   {
        
        
        "@id":"[email protected]@cogea.it",
        "@type":"Organization"
        
          ,"name": "Cogea srl"
          ,"email": "[email protected]"
          ,"email": "[email protected]"
  • @type should be "@type": "Dataset" instead of "@type": "schema:Dataset"

example record: https://emodnet.ec.europa.eu/geonetwork/srv/api/records/18d9daa1-eee4-4380-a2d0-fd20e4b47081

jmckenna avatar Sep 20 '23 18:09 jmckenna

I'm sure the institute has 2 emails, feel free to ignore one. This is Geonetwork, so we cannot simply change this structure

bart-v avatar Sep 20 '23 19:09 bart-v

@bart-v EMODnet Dataset records are now visible on the OIH live search results (direct link to your records).

I had to disable the indexing of your type:Organization instances due to an issue on our side.

emodnet-oih

jmckenna avatar Sep 30 '23 11:09 jmckenna

@jmckenna @bart-v

the ODISCat entry for EMODnet doesn't tell us where the sitemap is.

J Beja says there should be 2000+ records shared rather than the 709

pbuttigieg avatar Apr 19 '24 13:04 pbuttigieg

is there no valid JSON-LD available for harvest without the patch in our GitHub space ?

if so, then EMODnet is not a functional node yet. This is a major issue, especially as EMODnet is advertising its participation in ODIS

There is valid JSON-LD/schema.org in the example entry here https://emodnet.ec.europa.eu/geonetwork/srv/api/records/18d9daa1-eee4-4380-a2d0-fd20e4b47081

So is this just an issue with completing the ODISCat entry ?

pbuttigieg avatar Apr 19 '24 13:04 pbuttigieg

Few questions

  • How do you harvest, via CSW or another system? I see this https://github.com/iodepo/odis-arch/blob/master/collection/scripts/emodnet-harvest.py Is that actually used?
  • You seem to host a static sitemap here https://github.com/iodepo/odis-arch/blob/master/collection/tempHosting/data-emodnet/sitemap.xml Why?

There is a dynamic sitemap here https://emodnet.ec.europa.eu/geonetwork/srv/eng/portal.sitemap?format=rdf (we just fixed a bug in that sitemap). Indexing that should always give you all EMODnet entries.

bart-v avatar Apr 19 '24 15:04 bart-v

@bart-v those old scripts were a proof-of-concept created a few years ago, but since your unified service we no longer use those scripts. We use your sitemap instead. As @pbuttigieg pointed out, can you (actually Nathalie Tonné) edit your ODISCat entry and add your sitemap link to the ODIS-Arch URL field?

Thanks for pointing to your dynamic sitemap. The sitemap link we usually use (that points to the record pages with embedded JSON-LD) is https://emodnet.ec.europa.eu/geonetwork/srv/eng/portal.sitemap

@bart-v we noticed this morning that your JSON-LD are broken (they do not validate), as the Distribution property has an empty name, such as:

"distribution": [
        {
        "@type":"DataDownload",
        "contentUrl":"https://ows.emodnet.eu/geoserver/pace/ows?SERVICE=WMS&",
        "encodingFormat":"application/vnd.ogc.wms_xml",
        "name": ,
        "description": "https://ows.emodnet.eu/geoserver/pace/ows?SERVICE=WMS&"        }  
    ]

See this sample record, and see it fail in the schema.org validator

jmckenna avatar Apr 19 '24 17:04 jmckenna

There is valid JSON-LD/schema.org in the example entry here https://emodnet.ec.europa.eu/geonetwork/srv/api/records/18d9daa1-eee4-4380-a2d0-fd20e4b47081

In fact that is incorrect, that record and all EMODnet records contain invalid JSON-LD, they do not validate (per the empty name property mentioned above).

jmckenna avatar Apr 19 '24 17:04 jmckenna

@fils can you give a harvest using this sitemap? I am wondering if that RDF format works for ODIS harvest: https://emodnet.ec.europa.eu/geonetwork/srv/eng/portal.sitemap?format=rdf

jmckenna avatar Apr 19 '24 17:04 jmckenna

Empty Distribution.name was fixed

bart-v avatar Apr 19 '24 22:04 bart-v

We should stick with the vanilla sitemap

pbuttigieg avatar Apr 20 '24 19:04 pbuttigieg

Thanks for the fixes so far,

Looking better

https://validator.schema.org/#url=https%3A%2F%2Femodnet.ec.europa.eu%2Fgeonetwork%2Fsrv%2Fapi%2Frecords%2F847C10E349EFD10C710A1E3E8260AAC37A38D929

Content issues:

The record is double typed as a Dataset and Organization, this is weird, and we have double name properties and others. Is this an error? If not, it will lead to very confusing results when dealing with EMODnet data - i think Dataset is right here, the organisation would probably be a value of a property therein

the Creator and Author properties indicate EMODnet. Is this correct?  Where's the credit for the original creator? If this was modified by EMODnet biology, they should still be giving credit to the original creators with isBasedOn or similar properties

The distribution values are mostly 'wrong' - they're pointing to landing pages, not direct data downloads. Landing pages can be moved to URL arrays, distributions should be reserved for direct download links (think machine-to-machine data transfer)

pbuttigieg avatar Apr 22 '24 12:04 pbuttigieg

Content issues are being fixed slowly, but that is rather long term.

bart-v avatar Apr 22 '24 12:04 bart-v

@bart-v I checked the first record in your first sitemap index, but it failed the schema.org validator:

Below is your JSON-LD (I formatted it so it appears nicely). Critical issue is that your includedInDataCatalog is missing a "@type": "DataCatalog", breaking the validator. Also, there are so many parameters with no values. See below:

{
  "@context": "http://schema.org/",
  "@type": "schema:WebAPI",
  "@id": "https://emodnet.ec.europa.eu/geonetwork/srv/api/records/48ba841c-d06a-4c79-ab07-234d913eb975",
  "includedInDataCatalog": [
    {
      "url": "https://emodnet.ec.europa.eu/geonetwork/srv/search#",
      "name": ""
    }
  ],
  "inLanguage": "eng",
  "name": "GeoServer Web Map Service",
  "dateCreated": [],
  "dateModified": [
    "2024-04-22T03:44:00"
  ],
  "datePublished": [],
  "thumbnailUrl": [],
  "description": "A compliant implementation of WMS plus most of the SLD extension (dynamic styling). Can also generate PDF, SVG, KML, GeoRSS",
  "keywords": [
    "WFS",
    "WMS",
    "GEOSERVER",
    "Metadata GDI-Vl-conform"
  ],
  "author": [],
  "contributor": [],
  "creator": [
    {
      "@id": "[email protected]",
      "@type": "Organization",
      "name": "VLIZ",
      "email": "[email protected]",
      "contactPoint": {
        "@type": "PostalAddress",
        "addressCountry": "Belgium",
        "addressLocality": "Oosten",
        "postalCode": "8400"
      }
    }
  ],
  "provider": [
    {
      "@id": "[email protected]",
      "@type": "Organization",
      "name": "VLIZ",
      "email": "[email protected]",
      "contactPoint": {
        "@type": "PostalAddress",
        "addressCountry": "Belgium",
        "addressLocality": "Oosten",
        "postalCode": "8400"
      }
    }
  ],
  "copyrightHolder": [],
  "user": [],
  "sourceOrganization": [],
  "publisher": [],
  "distribution": [
    {
      "@type": "DataDownload",
      "contentUrl": "https://ows.emodnet.eu/geoserver/pace/ows?SERVICE=WMS&",
      "encodingFormat": "application/vnd.ogc.wms_xml",
      "description": "https://ows.emodnet.eu/geoserver/pace/ows?SERVICE=WMS&"
    }
  ],
  "encodingFormat": [
    ""
  ],
  "spatialCoverage": [],
  "license": [
    {
      "@type": "CreativeWork",
      "name": "no conditions apply"
    }
  ]
}

jmckenna avatar Apr 22 '24 19:04 jmckenna

Sample working includedInDataCatalog snippet:

    "includedInDataCatalog": {
        "@type": "DataCatalog",
        "name": "MEDIN Discovery Metadata Portal",
        "url": "https://portal.medin.org.uk/portal/",
        "description": "The MEDIN portal contains information about more than 15000 marine datasets from over 400 UK organisations. Metadata are an enduring resource and contact details are publicly available for a long time. Please contact us if you find your contact details on the MEDIN portal and do not consent to this. ([email protected])",
        "image": "https://portal.medin.org.uk/grfx/logo.png"
    },

jmckenna avatar Apr 22 '24 19:04 jmckenna

includedInDataCatalog is fixed All the rest will need to wait for the latest GeoNetwork and some content updates.

bart-v avatar Apr 22 '24 21:04 bart-v

The @id values in EMODnet records seem to point to what the identifier property should. See example below.

The @id should point to the JSON-LD file itself for consistency. The current value of @id below should be moved to identifier.

Note that the author array also uses @id for email addresses, which should not be. Email addresses have their own field email.


{
  "@context": {
    "@vocab": "https://schema.org/"
  },
  "@type": "Dataset",
  "@id": "https://emodnet.ec.europa.eu/geonetwork/srv/api/records/5d89d371-a52a-476b-92de-a423f6d2c15d",
  "includedInDataCatalog": {
    "@type": "DataCatalog",
    "name": "European Marine Observation and Data Network catalogue",
    "description": "The European Marine Observation and Data Network (EMODnet) is a network of organisations supported by the EU’s integrated maritime policy.",
    "url": "https://emodnet.ec.europa.eu/geonetwork/srv/eng/catalog.search#/home"
  },
  "inLanguage": "eng",
  "name": "EMODnet Human Activities, Maritime Spatial Planning (MSP)",
  "dateCreated": [
    "2021-01-22"
  ],
  "dateModified": [
    "2023-08-30"
  ],
  "datePublished": [
    "2021-01-22"
  ],
  "thumbnailUrl": [
    "https://ows.emodnet-humanactivities.eu:/geonetwork/srv/api/records/5d89d371-a52a-476b-92de-a423f6d2c15d/attachments/msp3.jpg"
  ],
  "description": "The database on Maritime Spatial Planning (MSP) in the EU was created in 2021 by CETMAR for the European Marine Observation and Data Network (EMODnet). It is the result of the aggregation and harmonization of datasets provided by several sources. It is updated as soon a new plan is adopted by an EU member state and it is available for viewing and download on EMODnet web portal (Human Activities, https://emodnet.ec.europa.eu/en/human-activities). The database contains polygons, points and lines (where available) representing Maritime Spatial Planning (MSP) in the following countries: Belgium, Denmark, Estonia, Finland, Germany, Latvia, Netherlands, Poland, Spain and Sweden. Maritime Spatial Planning (MSP) database is made up of 3 types of spatial features: MSP Spatial Plan, MSP Zoning Element and MSP Supplementary Regulation. Also there is a non spatial feature called MSP Official Documentation. The distance to coast (EEA coastline shapefile) has been calculated using the UTM WGS84 Zone projected coordinate system where data fall in.",
  "keywords": [
    "Land use",
    "coastal zone planning",
    "land use planning",
    "national planning",
    "policy planning"
  ],
  "author": [
    {
      "@id": "[email protected]",
      "@type": "Organization",
      "name": "Centro Tecnologico del Mar - Fundación CETMAR",
      "email": "[email protected]",
      "contactPoint": {
        "@type": "PostalAddress"
      }
    }
  ],
  "contributor": [],
  "creator": [],
  "provider": [
    {
      "@id": "[email protected]",
      "@type": "Organization",
      "name": "Federal Public Service (FPS) Health, Food Chain Safety and Environment",
      "email": "[email protected]",
      "contactPoint": {
        "@type": "PostalAddress"
      }
    },
    {
      "@id": "[email protected] and [email protected]",
      "@type": "Organization",
      "name": "Danish Maritime Authority (Secretariat for maritime spatial planning)",
      "email": "[email protected] and [email protected]",
      "contactPoint": {
        "@type": "PostalAddress"
      }
    },
    {
      "@id": "[email protected]",
      "@type": "Organization",
      "name": "Ministry of The Environment and Regionals councils (Uusimaa, Kymenlaakso, Southwest Finland, Satakunta, Ostrobothnia, Central Ostrobothnia, North Ostrobothnia and Lapland)",
      "email": "[email protected]",
      "contactPoint": {
        "@type": "PostalAddress"
      }
    },
    {
      "@id": "[email protected] and [email protected]",
      "@type": "Organization",
      "name": "Åland Provincial Government",
      "email": "[email protected] and [email protected]",
      "contactPoint": {
        "@type": "PostalAddress"
      }
    },
    {
      "@id": "[email protected] and [email protected]",
      "@type": "Organization",
      "name": "Ministry of Environmental Protection and Regional Development of The Republic of Latvia",
      "email": "[email protected] and [email protected]",
      "contactPoint": {
        "@type": "PostalAddress"
      }
    },
    {
      "@id": "[email protected]",
      "@type": "Organization",
      "name": "Bundesamt für Seeschifffahrt und Hydrographie (BSH)",
      "email": "[email protected]",
      "contactPoint": {
        "@type": "PostalAddress"
      }
    },
    {
      "@id": "[email protected]",
      "@type": "Organization",
      "name": "Ministry of Maritime Economy and Inland Navigation, Maritime offices of Gdynia, Slupsk and Szczecin",
      "email": "[email protected]",
      "contactPoint": {
        "@type": "PostalAddress"
      }
    },
    {
      "@id": "[email protected]",
      "@type": "Organization",
      "name": "Swedish Agency for Marine and Water Management",
      "email": "[email protected]",
      "contactPoint": {
        "@type": "PostalAddress"
      }
    },
    {
      "@id": "[email protected]",
      "@type": "Organization",
      "name": "Swedish Agency for Marine and Water Management",
      "email": "[email protected]",
      "contactPoint": {
        "@type": "PostalAddress"
      }
    },
    {
      "@id": "[email protected]",
      "@type": "Organization",
      "name": "Swedish Agency for Marine and Water Management",
      "email": "[email protected]",
      "contactPoint": {
        "@type": "PostalAddress"
      }
    },
    {
      "@id": "[email protected]",
      "@type": "Organization",
      "name": "Ministry of Infrastructure and the Environment (Noordzeeloket)",
      "email": "[email protected]",
      "contactPoint": {
        "@type": "PostalAddress"
      }
    },
    {
      "@id": "[email protected]",
      "@type": "Organization",
      "name": "Ministry of Infrastructure and the Environment (Noordzeeloket)",
      "email": "[email protected]",
      "contactPoint": {
        "@type": "PostalAddress"
      }
    },
    {
      "@id": "[email protected]",
      "@type": "Organization",
      "name": "Government of Spain - Ministry for Ecological Transition and the Demographic Challenge",
      "email": "[email protected]",
      "contactPoint": {
        "@type": "PostalAddress"
      }
    },
    {
      "@id": "[email protected]",
      "@type": "Organization",
      "name": "Government of Spain - Ministry for Ecological Transition and the Demographic Challenge",
      "email": "[email protected]",
      "contactPoint": {
        "@type": "PostalAddress"
      }
    },
    {
      "@id": "[email protected]",
      "@type": "Organization",
      "name": "Ministry of Finance (Planning Department) (Rahandusministeerium)",
      "email": "[email protected]",
      "contactPoint": {
        "@type": "PostalAddress"
      }
    },
    {
      "@id": "[email protected]",
      "@type": "Organization",
      "name": "Centro Tecnologico del Mar - Fundación CETMAR",
      "email": "[email protected]",
      "contactPoint": {
        "@type": "PostalAddress"
      }
    }
  ],
  "copyrightHolder": [],
  "user": [],
  "sourceOrganization": [],
  "publisher": [],
  "distribution": [
    {
      "@type": "DataDownload",
      "contentUrl": "https://emodnet.ec.europa.eu/en/human-activities",
      "encodingFormat": "WWW:LINK-1.0-http--link",
      "name": "EMODnet Human Activities",
      "description": "EMODnet Human Activities aims to facilitate access to existing marine data on activities carried out in EU waters, by building a single entry point for geographic information on human uses of the ocean."
    },
    {
      "@type": "DataDownload",
      "contentUrl": "https://ows.emodnet-humanactivities.eu/wms?",
      "encodingFormat": "OGC:WMS-1.3.0-http-get-map",
      "name": "mspspatialplan",
      "description": "MSP Spatial Plan Areas"
    },
    {
      "@type": "DataDownload",
      "contentUrl": "https://ows.emodnet-humanactivities.eu/wms?",
      "encodingFormat": "OGC:WMS-1.3.0-http-get-map",
      "name": "mspzoningline",
      "description": "MSP Zoning Lines"
    },
    {
      "@type": "DataDownload",
      "contentUrl": "https://ows.emodnet-humanactivities.eu/wms?",
      "encodingFormat": "OGC:WMS-1.3.0-http-get-map",
      "name": "mspzoninglocs",
      "description": "MSP Zoning Locations"
    },
    {
      "@type": "DataDownload",
      "contentUrl": "https://ows.emodnet-humanactivities.eu/wms?",
      "encodingFormat": "OGC:WMS-1.3.0-http-get-map",
      "name": "mspzoningpoly",
      "description": "MSP Zoning Areas"
    },
    {
      "@type": "DataDownload",
      "contentUrl": "https://ows.emodnet-humanactivities.eu/wfs?",
      "encodingFormat": "OGC:WFS",
      "name": "emodnet:mspspatialplan",
      "description": "MSP Spatial Plan Areas"
    },
    {
      "@type": "DataDownload",
      "contentUrl": "https://ows.emodnet-humanactivities.eu/wfs?",
      "encodingFormat": "OGC:WFS",
      "name": "emodnet:mspzoningline",
      "description": "MSP Zoning Lines"
    },
    {
      "@type": "DataDownload",
      "contentUrl": "https://ows.emodnet-humanactivities.eu/wfs?",
      "encodingFormat": "OGC:WFS",
      "name": "emodnet:mspzoninglocs",
      "description": "MSP Zoning Locations"
    },
    {
      "@type": "DataDownload",
      "contentUrl": "https://ows.emodnet-humanactivities.eu/wfs?",
      "encodingFormat": "OGC:WFS",
      "name": "emodnet:mspzoningpoly",
      "description": "MSP Zoning Areas"
    },
    {
      "@type": "DataDownload",
      "contentUrl": "https://ows.emodnet-humanactivities.eu:/geonetwork/srv/api/records/5d89d371-a52a-476b-92de-a423f6d2c15d/attachments/EMODnet_HA_MSP_20230830.zip",
      "encodingFormat": "WWW:DOWNLOAD-1.0-http--download",
      "name": "EMODnet_HA_MSP_20230830.zip",
      "description": "ZIP (File Geodatabase / Shapefile)"
    }
  ],
  "encodingFormat": [
    "unknown"
  ],
  "spatialCoverage": [
    {
      "@type": "Place",
      "description": [],
      "geo": [
        {
          "@type": "GeoShape",
          "box": "-21.90 28.21 24.59 65.90"
        }
      ]
    }
  ],
  "temporalCoverage": [
    "2021-01-22/"
  ],
  "license": [
    {
      "@type": "CreativeWork",
      "name": "no limitation"
    }
  ],
  "prov:wasAttributedTo": {
    "@id": "https://catalogue.odis.org/view/364",
    "@type": "prov:Organization",
    "rdf:name": "European Marine Observation and Data Network catalogue",
    "rdfs:seeAlso": "https://emodnet.ec.europa.eu/"
  }
}

pbuttigieg avatar Sep 26 '24 11:09 pbuttigieg