odis-arch icon indicating copy to clipboard operation
odis-arch copied to clipboard

Harvest Pensoft Journals

Open teodorgeorgiev opened this issue 1 year ago • 10 comments

Sitemap links:

https://bdj.pensoft.net/sitemap/marine-articles-index.xml https://zookeys.pensoft.net/sitemap/marine-articles-index.xml https://phytokeys.pensoft.net/sitemap/marine-articles-index.xml https://biorisk.pensoft.net/sitemap/marine-articles-index.xml https://neobiota.pensoft.net/sitemap/marine-articles-index.xml https://natureconservation.pensoft.net/sitemap/marine-articles-index.xml https://zse.pensoft.net/sitemap/marine-articles-index.xml https://nl.pensoft.net/sitemap/marine-articles-index.xml https://riojournal.com/sitemap/marine-articles-index.xml https://italianbotanist.pensoft.net/sitemap/marine-articles-index.xml https://rethinkingecology.pensoft.net/sitemap/marine-articles-index.xml https://mbmg.pensoft.net/sitemap/marine-articles-index.xml https://biss.pensoft.net/sitemap/marine-articles-index.xml https://jor.pensoft.net/sitemap/marine-articles-index.xml https://travaux.pensoft.net/sitemap/marine-articles-index.xml https://neotropical.pensoft.net/sitemap/marine-articles-index.xml https://herpetozoa.pensoft.net/sitemap/marine-articles-index.xml https://vcs.pensoft.net/sitemap/marine-articles-index.xml https://plecevo.eu/sitemap/marine-articles-index.xml https://aquaticinvasions.arphahub.com/sitemap/marine-articles-index.xml

JSON-LD is embedded in each article (i.e. https://bdj.pensoft.net/article/128431).

teodorgeorgiev avatar Sep 16 '24 13:09 teodorgeorgiev

@teodorgeorgiev many thanks!

Notes to ODIS team

@jmckenna @fils - this is a link to Pensoft journals, who have created marine subsets of their assets for ODIS.

@jmckenna - you'll note they use the @ScholarlyArticle type, which is a subtype of @Article. This would need a new facet on the frontend and likely some SOLR config if that hasn't been automated yet.

First-pass review of content

I'm taking the first example to review: https://bdj.pensoft.net/article/969

In the Schema.org validator, we get: https://validator.schema.org/#url=https%3A%2F%2Fbdj.pensoft.net%2Farticle%2F969

Some comments:

Escaped characters

Like "@context": "https:\/\/schema.org", and \n should be removed. These will cause compatibility issues downstream.

@id

The @id of the whole document could be the URL to the landing page with the JSON-LD embedded in it. That would let parsers like gleaner know where to get the JSON-LD.

URL

    "url": "https:\/\/bdj.pensoft.net\/",

Would rather be the URL to the landing page of the paper. Same as your mainEntityOfPage value.

Keywords and newlines

    "keywords": "\n  Lapland, faunistics, mayflies, aapamires, ponds,\n",
  • The newlines will cause issues and should be removed. These escaped characters are found in many fields.
  • The keyword list should be an array, in its current form, all of the keywords above will be understood as a single keyword.

This should look like:

    "keywords": [ "Lapland", "faunistics", "mayflies", "aapamires", "ponds"]

Semantically qualified identifiers

There's noting "wrong" with this:

    "identifier": {
        "@type": "PropertyValue",
        "@propertyID": "DOI",
        "value": "10.3897\/BDJ.1.e969"
    },

but it would be better as:

    "identifier": {
        "@type": "PropertyValue",
        "propertyID": "https://registry.identifiers.org/registry/doi",
        "url": "https://doi.org/10.3897/BDJ.1.e969",
        "value": "10.3897/BDJ.1.e969"
    }

Content location

@jmckenna @fils - this is the superproperty of what we usually harvest spatial data from (i.e. spatialCoverage): "The location depicted or described in the content. For example, the location in a photograph or painting."

This is valid, and the ODIS stack may need to make some tweaks to accommodate all subproperties of the expected properties in its processing.

Price

@teodorgeorgiev you may wish to add the priceCurrency property to stanzas like this one.

    "offers": {
        "@type": "Offer",
        "price": "5.10",
        "availability": "https:\/\/schema.org\/InStock",
        "description": "Order Printed version"
    },

Pensoft as an Org

I would add some more information about Pensoft in these stanzas, like the website, address, etc.

@teodorgeorgiev it may be worth 1) creating a Organization-typed JSON-LD doc for Pensoft and 2) using the @id JSON-LD keyword to link to it via a PID (DOI, W3ID, etc). This way, you can abbreviate:

    "publisher": {
        "@type": "Organization",
        "name": "Pensoft Publishers",
        "logo": {
            "@type": "ImageObject",
            "url": "https:\/\/pensoft.net\/new_images\/pensoft_logo.svg"
        }
    },

TO

    "publisher": {
        "@id": "https://pid-of-choice/pid-of-Pensoft-JSON-LD-Doc"
    },

You can do that for any repetitive elements if you are sure to issue persistent identifiers that are stable. If that's not the case, it's fine to embed the information in each JSON-LD doc as you've done.

pbuttigieg avatar Sep 16 '24 14:09 pbuttigieg

@teodorgeorgiev

PS: if the articles mention or link out to sources like OBIS, GBIF, INSDC, or others, you can add identifiers of those records using the citation property.

You may also want to add the creditText property to your records, which would contain the recommended citation text of the asset described by the JSON-LD. e.g.

"creditText": "Salmela J, Savolainen E (2013) New records of Paraleptophlebia werneri Ulmer, 1920 and P. strandii (Eaton, 1901) from Finland (Ephemeroptera, Leptophlebiidae). Biodiversity Data Journal 1: e969. https://doi.org/10.3897/BDJ.1.e969"

pbuttigieg avatar Sep 16 '24 14:09 pbuttigieg

@pbuttigieg I have resolved your comments except the one about the Organization. I prefer to leave it as it is.

teodorgeorgiev avatar Sep 17 '24 09:09 teodorgeorgiev

Thanks @teodorgeorgiev

Perhaps the last optimisation is to create a sitemap index (itself a sitemap that points to other sitemaps) so you can then maintain a single ODISCat entry for Pensoft Journals.

This sitemap index would include all the links in your original post and can be changed your side with low to no additional steps IODE-side

Here's an example from Pacific Data Hub

https://pacificdata.org/organization/sitemap.xml

pbuttigieg avatar Sep 17 '24 09:09 pbuttigieg

@pbuttigieg Sure, here it is: https://pensoft.net/marine-sitemap.xml

teodorgeorgiev avatar Sep 17 '24 11:09 teodorgeorgiev

Thanks @teodorgeorgiev - please create (if you haven't) the OceanExpert (for yourself and Pensoft, the latter which you can use as an identifier value in your Organization stanzas for Pensoft) and ODISCat entries to initiate the harvest, as described in: https://book.odis.org/gettingStarted.html

This was an exemplary implementation path @jmckenna - to be used for training / coaching in future

pbuttigieg avatar Sep 17 '24 11:09 pbuttigieg

@pbuttigieg just did:

https://oceanexpert.org/expert/71704 https://oceanexpert.org/institution/24685

I have added this identifier to the "Organization", however, I would like also to embed the information in each JSON-LD - for consistency (the journals can have different publisher than Pensoft).

teodorgeorgiev avatar Sep 17 '24 12:09 teodorgeorgiev

@pbuttigieg just did:

https://oceanexpert.org/expert/71704 https://oceanexpert.org/institution/24685

Great - give the validation process a couple of days and then you can use those to log in to ODISCat and register something like "Pensoft Journals - marine content source"

I have added this identifier to the "Organization", however, I would like also to embed the information in each JSON-LD - for consistency (the journals can have different publisher than Pensoft).

Yes, that makes sense. The two can reinforce one another.

pbuttigieg avatar Sep 17 '24 12:09 pbuttigieg

@pbuttigieg please check https://catalogue.odis.org/view/3312 Once it is live, our PR officer Iva Boyadzhieva ([email protected]) will prepare a press release about this integration. It would be great if we could coordinate this with your PR department. Do you know who we should contact about it?

teodorgeorgiev avatar Sep 17 '24 15:09 teodorgeorgiev

@teodorgeorgiev I'll link you to our team via email - we have some press materials and can coordinate announcements

pbuttigieg avatar Sep 17 '24 16:09 pbuttigieg