odis-arch icon indicating copy to clipboard operation
odis-arch copied to clipboard

connect INCOIS catalogue as ODIS node

Open jmckenna opened this issue 2 years ago • 10 comments

  • catalogue home: https://incois.gov.in/essdp/
  • sample metadata record
  • catalogue architecture (text found in help files):
      "Metadata portal is developed using Java enterprise technologies and is deployed on Apache Tomcat web application server. MySQL database is used for the archival of the metadata information."
    
    Features:
    - ISO 19115 standards compliant representation of metadata information
    - GCMD Science Keywords for controlled keyword search
    - Spatial, Temporal, Keyword & Free Text Search
    - Simple interface for metadata submission, update and search
    - Java EE technologies based cross platform solution
    

Possible next steps:

  • INCOIS team to embed JSON-LD on each record page in the catalogue
  • INCOIS to create a sitemap.xml pointing to each record, and place the sitemap on the web
  • INCOIS to create an entry in the ODIS Catalogue
    • important fields are
      • Startpoint URL for ODIS-Arch (this is the url to your sitemap.xml file)
      • Type of the ODIS-Arch URL (select "Sitemap")

jmckenna avatar Feb 06 '24 21:02 jmckenna

Update from INCOIS team:

The following steps for INCOIS to become an ODIS node on OIH are completed.

1. Modify ViewMetadata page to generate and include JSON-LD for each metadata entry → Embedded JSON-LD in the view metadata page 
2. Create sitemap.xml including links to all metadata entries → https://incois.gov.in/essdp/xml/sitemap.xml

We can proceed with the third step [ODISCat entry].

jmckenna avatar Apr 15 '24 11:04 jmckenna

Latest feedback for the INCOIS team regarding the ODISCat entry, sitemap, and the embedded JSON-LD:

  • the ODISCat entry looks good, the ODIS-arch fields are set properly

  • some of the record's JSON-LD do not validate, such as this record

    • you can use the schema.org validator to check the JSON-LD embedded in your pages

    • the creator property has an extra comma at the end, causing an error:

        "creator": [
          {
            "@type": "Role",
            "roleName": "PointOfContact",
            "creator": {
              "@type": "Person",
              "name": "Johnny Konjarla",
              "jobTitle": "Project Scientist",
              "email": "johnny.konjarla [ at ] gmail.com",
              "telephone": "9949836662",
              "affiliation": {
                "@type": "Organization",
                "name": "Centre for Marine Living Resources and Ecology (CMLRE)"
              },
              "address": {
                "@type": "PostalAddress",
                "streetAddress": "CMLRE, Atal Bhavan, LNG Road, Puthuvypin South,Ochanthuruthu PO",
                "addressLocality": "Kochi",
                "addressRegion": "Kerala",
                "postalCode": "682508",
                "addressCountry": "IND"
              }
            }
          },         <---------- here is the extra comma
        ]
      
  • here is another record that fails to validate the JSON-LD

    • notice the extra comma in the address property:

          "address": {
            "@type": "PostalAddress",
            ,                                            <-------------------extra comma 
            "addressLocality": "Vasco-da-Gama",
            "addressRegion": "Goa",
            "postalCode": "403804",
            "addressCountry": "IND"
          }
      
  • the <meta> and <link> HTML tags are not closed in the pages, causing validation errors.

    <meta charset="utf-8">
    <link href="essdp4/assets/vendor/bootstrap/css/bootstrap.min.css" rel="stylesheet">
    

    should instead be:

    <meta charset="utf-8" />
    <link href="essdp4/assets/vendor/bootstrap/css/bootstrap.min.css" rel="stylesheet" />
    

jmckenna avatar May 28 '24 15:05 jmckenna

@pbuttigieg reminder of my question about the creator property with an embedded creator in the pull request at https://github.com/iodepo/odis-in/pull/14 (or see my previous comment's snippet here, to see what I had meant).

jmckenna avatar May 28 '24 16:05 jmckenna

Thanks for the quick fixes by the INCOIS team.

There is another error in the HTML: the <img> tag is not closed, see this record

  <img src="images/logo.png" alt="" class="img-fluid" width="320" height="120" >

should instead be:

  <img src="images/logo.png" alt="logo" class="img-fluid" width="320" height="120" />

Please also add a value for alt (see above).

jmckenna avatar May 29 '24 10:05 jmckenna

We're really close with this one. It seems that the INCOIS team could benefit from validation checks for their HTML.

@fils does invalid HTML affect Gleaner? Or can the ODIS harvest ignore this?

pbuttigieg avatar Jun 01 '24 07:06 pbuttigieg

Latest results when trying to harvest from the INCOIS sitemap:

  • Gleaner error syntax error on line 289: unquoted or missing attribute value in element in record https://incois.gov.in/essdp/ViewMetadata?fileid=5f7f56a5-2868-4baf-acc4-d37595774ce2
    • appears to be related to <input type=button which should instead be <input type="button"

jmckenna avatar Jun 03 '24 20:06 jmckenna

Similar to other HTML tags mentioned above, that same input tag needs to be closed: <input type="button" ... />

jmckenna avatar Jun 04 '24 14:06 jmckenna

update needed from INCOIS team:

  • change the syntax in your sitemap.xml file
    • remove mentions of sitemapindex, so that your new sitemap looks like:
      <?xml version="1.0" encoding="UTF-8"?>
      <urlset xmlns="https://www.sitemaps.org/schemas/sitemap/0.9">
        <url>
          <loc>https://incois.gov.in/essdp/ViewMetadata?fileid=5f7f56a5-2868-4baf-acc4-d37595774ce2</loc>
          <lastmod>2024-05-30</lastmod>
        </url>
        <url>
          <loc>https://incois.gov.in/essdp/ViewMetadata?fileid=64e955b6-bba3-4176-ac97-7b5f543bf0c7</loc>
          <lastmod>2024-05-30</lastmod>
        </url>
        <url>
          <loc>https://incois.gov.in/essdp/ViewMetadata?fileid=a818c658-251c-4fb8-934b-1ec91e3995f7</loc>
          <lastmod>2024-05-29</lastmod>
        </url>
        ...
      </urlset>
      
    
    

(generally, a sitemapindex is used when you have over 50k records) This is also described in our ODIS Book .

jmckenna avatar Jun 10 '24 18:06 jmckenna

  • additional changes needed by INCOIS team:
    • in the sitemap.xml file, change line#2 to: <urlset xmlns="https://www.sitemaps.org/schemas/sitemap/0.9">

      (notice the https in the link, instead of http)

    • in this record

      • on line#93 change the text Synedra(\) to Synedra(\\)

        (in other words, the backslash must be escaped in the JSON-LD)

jmckenna avatar Jun 11 '24 12:06 jmckenna

@jmckenna looks like the dashboard reports all datasets from the Marine Science and Oceans facets of the INCOIS resource (1047)

SmartSelect_20241114_122048_Brave.png

pbuttigieg avatar Nov 14 '24 11:11 pbuttigieg