odis-arch icon indicating copy to clipboard operation
odis-arch copied to clipboard

Connect initial GOOS ERDDAP endpoint with ODIS

Open jmckenna opened this issue 1 year ago • 8 comments

  • sample catalogue (27 datasets) : https://osmc.noaa.gov/erddap/info/index.html
  • has embedded JSON-LD on the page (execute a view-source), listing type: DataCatalog and type: Dataset
  • the sameAs property is used to point to the individual dataset's JSON-LD
  • sitemap does exist for the endpoint: https://osmc.noaa.gov/erddap/sitemap.xml

To-Do

  • add sitemap to ODIS config
  • check nightly report on the sitemap resources
  • try to harvest into ODIS' Solr and front-end

@fils @pbuttigieg my findings differ from our earlier discussions (I reviewed how the NMDIS-China partner setup their ERDDAP endpoint, and it seems to match the NOAA endpoint). Am I misunderstanding the desired steps here? Please explain.

Paste of the JSON-LD that is embedded above

{
  "@context": "http://schema.org",
  "@type": "DataCatalog",
  "name": "ERDDAP Data Server at OSMC",
  "url": "https://osmc.noaa.gov/erddap",
  "publisher": {
    "@type": "Organization",
    "name": "OSMC",
    "address": {
      "@type": "PostalAddress",
      "addressCountry": "USA",
      "addressLocality": "7600 Sand Point Way NE, Seattle",
      "addressRegion": "WA",
      "postalCode": "98115"
    },
    "telephone": "+1 206-555-1212",
    "email": "[email protected]",
    "sameAs": "http://www.osmc.noaa.gov"
  },
  "fileFormat": [
    "application/geo+json",
    "application/json",
    "text/csv"
  ],
  "isAccessibleForFree": "True",
  "dataset": [
    {
      "@type": "Dataset",
      "name": "CCHDO GO SHIP bottle data",
      "sameAs": "https://osmc.noaa.gov/erddap/info/cchdo_bottle/index.html"
    },
    {
      "@type": "Dataset",
      "name": "CCHDO GO SHIP ctd data",
      "sameAs": "https://osmc.noaa.gov/erddap/info/cchdo_ctd/index.html"
    },
    {
      "@type": "Dataset",
      "name": "Global Drifter Program - 1 Hour Interpolated QC Drifter Data",
      "sameAs": "https://osmc.noaa.gov/erddap/info/drifter_hourly_qc/index.html"
    },
    {
      "@type": "Dataset",
      "name": "Global Drifter Program - 6 Hour Interpolated QC Drifter Data",
      "sameAs": "https://osmc.noaa.gov/erddap/info/drifter_6hour_qc/index.html"
    },
    {
      "@type": "Dataset",
      "name": "IOOS GTS counts",
      "sameAs": "https://osmc.noaa.gov/erddap/info/ioos_obs_counts/index.html"
    },
    {
      "@type": "Dataset",
      "name": "JASL/UHSLC Research Quality Tide Gauge Data (daily)",
      "sameAs": "https://osmc.noaa.gov/erddap/info/global_daily_rqds/index.html"
    },
    {
      "@type": "Dataset",
      "name": "JASL/UHSLC Research Quality Tide Gauge Data (hourly)",
      "sameAs": "https://osmc.noaa.gov/erddap/info/global_hourly_rqds/index.html"
    },
    {
      "@type": "Dataset",
      "name": "JCOMMPS Active WMO ID LIST",
      "sameAs": "https://osmc.noaa.gov/erddap/info/wmo_list/index.html"
    },
    {
      "@type": "Dataset",
      "name": "meop animal profiles",
      "sameAs": "https://osmc.noaa.gov/erddap/info/MEOP_profiles/index.html"
    },
    {
      "@type": "Dataset",
      "name": "OSMC 90 day RT data",
      "sameAs": "https://osmc.noaa.gov/erddap/info/OSMC_30day/index.html"
    },
    {
      "@type": "Dataset",
      "name": "OSMC Argo Profile data",
      "sameAs": "https://osmc.noaa.gov/erddap/info/OSMC_PROFILERS/index.html"
    },
    {
      "@type": "Dataset",
      "name": "OSMC flattened observations from GTS",
      "sameAs": "https://osmc.noaa.gov/erddap/info/OSMC_flattened/index.html"
    },
    {
      "@type": "Dataset",
      "name": "OSMC flattened observations from GTS",
      "sameAs": "https://osmc.noaa.gov/erddap/info/osmc_test/index.html"
    },
    {
      "@type": "Dataset",
      "name": "OSMC normalized observations from GTS",
      "sameAs": "https://osmc.noaa.gov/erddap/info/OSMC_Points/index.html"
    },
    {
      "@type": "Dataset",
      "name": "OSMC Profiles",
      "sameAs": "https://osmc.noaa.gov/erddap/info/OSMCV4_DUO_PROFILES/index.html"
    },
    {
      "@type": "Dataset",
      "name": "OSMC surface trajectory data",
      "sameAs": "https://osmc.noaa.gov/erddap/info/OSMCV4_DUO_SURFACE_TRAJECTORY/index.html"
    },
    {
      "@type": "Dataset",
      "name": "OSMC TimeSeries data",
      "sameAs": "https://osmc.noaa.gov/erddap/info/OSMCV4_DUO_TIME_SERIES/index.html"
    },
    {
      "@type": "Dataset",
      "name": "TAO/TRITON, RAMA, and PIRATA Buoys, Daily, 1977-present, Air Temperature",
      "sameAs": "https://osmc.noaa.gov/erddap/info/pmelTaoDyAirt/index.html"
    },
    {
      "@type": "Dataset",
      "name": "TAO/TRITON, RAMA, and PIRATA Buoys, Daily, 1977-present, Sea Surface Temperature",
      "sameAs": "https://osmc.noaa.gov/erddap/info/pmelTaoDySst/index.html"
    },
    {
      "@type": "Dataset",
      "name": "TAO/TRITON, RAMA, and PIRATA Buoys, Daily, 1977-present, Temperature",
      "sameAs": "https://osmc.noaa.gov/erddap/info/pmelTaoDyT/index.html"
    },
    {
      "@type": "Dataset",
      "name": "TAO/TRITON, RAMA, and PIRATA Buoys, Daily, 1977-present, Wind",
      "sameAs": "https://osmc.noaa.gov/erddap/info/pmelTaoDyW/index.html"
    },
    {
      "@type": "Dataset",
      "name": "TAO/TRITON, RAMA, and PIRATA Buoys, Daily, 1987-present, Salinity",
      "sameAs": "https://osmc.noaa.gov/erddap/info/pmelTaoDyS/index.html"
    },
    {
      "@type": "Dataset",
      "name": "TAO/TRITON, RAMA, and PIRATA Buoys, Daily, 1988-2020, ADCP",
      "sameAs": "https://osmc.noaa.gov/erddap/info/pmelTaoDyAdcp/index.html"
    },
    {
      "@type": "Dataset",
      "name": "TAO/TRITON, RAMA, and PIRATA Buoys, Daily, 1989-present, Wind Stress",
      "sameAs": "https://osmc.noaa.gov/erddap/info/pmelTaoDyTau/index.html"
    },
    {
      "@type": "Dataset",
      "name": "TAO/TRITON, RAMA, and PIRATA Buoys, Daily, 1992-present, Sea Surface Salinity",
      "sameAs": "https://osmc.noaa.gov/erddap/info/pmelTaoDySss/index.html"
    },
    {
      "@type": "Dataset",
      "name": "TAO/TRITON, RAMA, and PIRATA Buoys, Daily, 1997-present, Precipitation",
      "sameAs": "https://osmc.noaa.gov/erddap/info/pmelTaoDyRain/index.html"
    }
  ]
}

jmckenna avatar Oct 10 '23 18:10 jmckenna

@fils @pbuttigieg I wonder if in this case, my framing script could grab the sameAs values and harvest the individual dataset's JSON-LD, for the entire catalogue.

Or, let me know if I am misunderstanding the planned path through this.

jmckenna avatar Oct 10 '23 19:10 jmckenna

Ah, the issue is Gleaner not currently able to get the JSON-LD.

(I wonder if a temporary script as I mention above could be used for the short-term)

jmckenna avatar Oct 10 '23 20:10 jmckenna

So the sitemap: https://osmc.noaa.gov/erddap/sitemap.xml points to a document with 4 entries for MEOP (as an exmaple)

url>
<loc>https://osmc.noaa.gov/erddap/tabledap/MEOP_profiles.html</loc>
<lastmod>2023-09-21</lastmod>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>

<url>
<loc>https://osmc.noaa.gov/erddap/info/MEOP_profiles/index.html</loc>
<lastmod>2023-09-21</lastmod>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>

<url>
<loc>https://osmc.noaa.gov/erddap/tabledap/MEOP_profiles.graph</loc>
<lastmod>2023-09-21</lastmod>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>

<url>
<loc>https://osmc.noaa.gov/erddap/tabledap/MEOP_profiles.subset</loc>
<lastmod>2023-09-21</lastmod>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>

From Kevin, we have the following

Data: https://osmc.noaa.gov/erddap/tabledap/MEOP_profiles.html Metadata: https://osmc.noaa.gov/erddap/info/MEOP_profiles/index.html

The schema.org markup is in the "view-source" on the metadata page.

So we could start a basic index with that.

fils avatar Oct 11 '23 13:10 fils

  • added OSMC endpoint into ODIS config through https://github.com/iodepo/odis-arch/commit/4bbd46843c728b5b0935584bff40f97f10b1800f

jmckenna avatar Oct 12 '23 12:10 jmckenna

Some raw comments from our meeting today:

issue in sitemap - 1800 records, but only 27 datasets. Lots of stuff auto-popuated in it - which seem like they don't need to be: Action - create a sitemap index, a sitemap which points to other sitemaps, and then have a sitemap for datasets, one for util pages, etc, so you can direct ODIS and others to the ones you want us to harvest. Efficiency issue, but it's not a show stopper, Gleaner ignores what doesn't have JSON-LD.

includedInDataCat: "sameAs" - should be substituted with "url" property

keywords - way too many. A misunderstanding of how to use this property well. The keywords should be informative, not a general grab of lots of random things. focus on only those that are really about the dataset.

if "headline" is indeed mapping to the dataset id in ERDDAP, use "identifier" instead

use the "about" property for a focused descriptor: this should contain really the main topic of this data set

"license": being mapped from right place, but weird values. GOOS may provide a list of licenses to recommend.

For any variables not measured use an array of additionalProperty properties: so for codes, assigned status, QC flags, etc.

is the description and alternate name coming from the same place on NetCDF. Many of these are not very descriptive. A long name is not a description. A comment isn't either, but it's closer.

species (L189) in the variableMeasured block has no value - with no "value" property, we take the "we don't know the value" positon.

"conventions" (really its the syntax or format) and "axisOrDataVariable" are very GOOS-specific abstractions - this hurts cross-platform discovery, and a relatively quick fix can make these more FAIR. on Conventions, it would be best to link to the documentation of the convention.

creator - too short, need full names, sameAs--> url

pbuttigieg avatar Nov 15 '23 15:11 pbuttigieg

@kevin-obrien I've harvested the OSMC endpoint, and I can see the 27 records in a development instance of the front-end search for ODIS (see screencaptures below).

We currently have an issue displaying the bounding boxes on a map, related to the large spatial extents of these records though (temporarily I was able to view them by tweaking the mapping code, but, an entire rewrite of that mapping code is being done now...). Question: are you ok if I publish these 27 records to the live search (oceaninfohub.org), knowing that they won't be displayed in the "Spatial Search", yet?

Screenshot 2023-12-07 075638 Screenshot 2023-12-07 075730

Screenshot 2023-12-07 075806

jmckenna avatar Dec 07 '23 12:12 jmckenna

@kevin-obrien the OSMC records are now visible on the ODIS live search (!). Give it a try, here is a direct link just to those dataset records: https://oceaninfohub.org/results/Dataset?page=0&facet_query=facetType%3Dtxt_provider%26facetName%3DObserving%2BSystem%2BMonitoring%2BCenter%2B%2528OSMC%2529

jmckenna avatar Apr 07 '24 17:04 jmckenna

@kevin-obrien it would be good if you can also create an entry in the ODIS Catalogue for this OSMC endpoint:

  • login with your Ocean Expert ID
    • important fields are
      • Startpoint URL for ODIS-Arch (this is the url to your sitemap.xml file)
        • for OSMC, it is: https://osmc.noaa.gov/erddap/sitemap.xml
      • Type of the ODIS-Arch URL (select "Sitemap")

jmckenna avatar Apr 07 '24 17:04 jmckenna

An ODISCat entry was created by @kevin-obrien : https://catalogue.odis.org/view/3307

jmckenna avatar Dec 02 '24 20:12 jmckenna