arctos icon indicating copy to clipboard operation
arctos copied to clipboard

Make eml better and more useful for updates vs. initial publication

Open Jegelewicz opened this issue 3 years ago • 14 comments

I've added Dusty to the Associated Parties.

@dustymc can we get this put into the eml generator? You were added as technical contact, but it doesn't end up in the eml.

Originally posted by @Jegelewicz in https://github.com/ArctosDB/data-migration/issues/480#issuecomment-917184676

Jegelewicz avatar Sep 10 '21 20:09 Jegelewicz

@dustymc what information do you need?

Jegelewicz avatar Feb 07 '22 21:02 Jegelewicz

Big-picture, some conversation with Dave lead me to believe that the EML isn't being used - is it?

If it is in fact useful, then I need to know what exactly you want added - there are a couple of things mentioned here, one's already there, I'm not sure how the other should be handled.

dustymc avatar Feb 07 '22 22:02 dustymc

EML isn't being used - is it?

Yes it is - currently ONLY when collections are first published to the IPT BUT he may be able to use it for updates in future. Needs testing and I am waiting on MOU with Dave to get finished before I ask him to do anything. Once that is done. maybe a meetup would be best.

Jegelewicz avatar Feb 08 '22 22:02 Jegelewicz

Met with @dbloom yesterday to discuss eml improvements to make his life easier. See worksheet comparing what we provide, with what ends up (after Dave's and collection review manual changes) in the IPT. There are some things our eml could be doing better. I will see what I can do about it and then ask for help. We will use changes for ASNHC collection manager to test.

Jegelewicz avatar Dec 30 '22 16:12 Jegelewicz

Also, note new capability at https://ipt.gbif.org/manual/en/ipt/latest/gbif-metadata-profile#what-changed-in-version-1-1-of-the-gmp-since-1-0-2 In particular, I think we could provide:

  • machine readable license. Note instructions on how to provide a machine readable license can be found here. (see https://github.com/ArctosDB/arctos/issues/5425 and https://github.com/ArctosDB/arctos/issues/5426)

  • multiple contacts, creators, metadataProvider and project personnel (we currently provide creator and metadata contacts, suggest adding technical (programmers Dusty & Dave see https://github.com/ArctosDB/arctos/issues/3919#issuecomment-1368055106) and administrative contacts)

  • userIds for any agent (e.g. ORCID, Wikidata) (this is already baked in for ORCiD, can we add Wikidata? see https://github.com/ArctosDB/arctos/issues/3919#issuecomment-1368051953)

  • providing information about the frequency with which changes are made to the dataset (we do this already)

  • providing a project identifier (e.g. to associate datasets under a common project) (add Arctos project for all collections?)

  • The description can be broken into separate paragraphs versus all lumped into one

  • Note that collections can also provide a logo with their metadata

<resourceLogoUrl>http://ipt.ala.org.au/logo.do?r=global</resourceLogoUrl>

Suggestions for additions/changes from my discussion with Dave.

  • Add the Arctos url for the collection to additional identifiers (see https://github.com/ArctosDB/arctos/issues/3919#issuecomment-1368039508)
  • Add the IPT identifier to additional identifiers (see also https://github.com/ArctosDB/arctos/issues/5291)
  • Add any dataset that this replaces (see also https://github.com/ArctosDB/arctos/issues/5291)
  • Add the Organization for ALL contacts (REQUIRED)
  • Add Dave and Dusty as contacts for ALL collections as technical support (see https://github.com/ArctosDB/arctos/issues/3919#issuecomment-1368055106)
  • Select taxa from taxonomy instead of using the current free text field and allow for selection of multiple taxa (then pull the rank from preferred taxonomic source)
  • If nothing is provided in coordinates use "global" (westBoundingCoordinate-180, eastBoundingCoordinate180, northBoundingCoordinate 90, southBoundingCoordinate -90)
  • Add "VertNet IPT Norms: http://vertnet.org/resources/norms.html" to all collection "additionalInfo" (see https://github.com/ArctosDB/arctos/issues/3919#issuecomment-1368050116)
  • We need to coordinate "packageid" as we are currently out of sync with what is at VertNet and this needs updating every time significant changes are made to the metadata, I don't know the best way to do this

If we can get the eml to include everything that is needed at the IPT, then we can schedule an annual update of all Arctos collections. Once the GBIF-->GrSciColl connections are made (in process now!), updates in Arctos metadata will be able to be automatically (well almost) made at the VertNet IPT, those changes will propagate to GBIF, and then to GrSciColl, meaning that collection managers only need to update collection metadata in Arctos (and perhaps review it at the IPT to ensure everything makes it there as expected). This will make it much easier for first time publishers and for the times when staff changes or other significant changes in the metadata need to be updated everywhere. We are very close here and I'd like to see this get done if at all possible!

Jegelewicz avatar Dec 30 '22 16:12 Jegelewicz

Suggest adding this per Dave's recommendation (see Google Sheet)

<cfset eml=eml & chr(10) & chr(9) & chr(9) & chr(9) & chr(9) & '<alternateIdentifier>#application.serverRootURL#/collection/#d.guid_prefix#</alternateIdentifier>'>

Jegelewicz avatar Dec 30 '22 18:12 Jegelewicz

Suggest changing per Dave's recommendation

<cfset eml=eml & chr(10) & chr(9) & '<additionalInfo>'>
		<cfset eml=eml & chr(10) & chr(9) & chr(9) & '<para>'>
		<cfset eml=eml & chr(10) & chr(9) & chr(9) & chr(9) & '#EncodeForXML(d.collection_terms_display)#: #EncodeForXML(d.collection_terms_uri)#'>
		<cfset eml=eml & chr(10) & chr(9) & chr(9) & '</para>'>
		<cfset eml=eml & chr(10) & chr(9) & '</additionalInfo>'>

Change to

<cfset eml=eml & chr(10) & chr(9) & '<additionalInfo>'>
		<cfset eml=eml & chr(10) & chr(9) & chr(9) & '<para>'>
		<cfset eml=eml & chr(10) & chr(9) & chr(9) & chr(9) & '#EncodeForXML(d.collection_terms_display)#: #EncodeForXML(d.collection_terms_uri)#; VertNet IPT Norms: http://vertnet.org/resources/norms.html'>
		<cfset eml=eml & chr(10) & chr(9) & chr(9) & '</para>'>
		<cfset eml=eml & chr(10) & chr(9) & '</additionalInfo>'>

Jegelewicz avatar Dec 30 '22 18:12 Jegelewicz

Suggest adding for Wikidata IDs

<cfquery name="thisQ" dbtype="query">
	select address from aa where  agent_id=#da.agent_id# and address_type='Wikidata'
</cfquery>
<cfloop query="thisQ">
	<cfset x=x & chr(10) & btbs & chr(9) & '<userId directory="https://www.wikidata.org/wiki/">#EncodeForXML(thisQ.address)#</userId>'>
</cfloop>

Jegelewicz avatar Dec 30 '22 18:12 Jegelewicz

Add Dave and Dusty as contacts for ALL collections as technical support

Maybe instead of doing this through collection contacts, we just hard code this into the eml? This should be the same for all collections published through Arctos, then we won't have to rely on collections adding the appropriate technical contacts (or can we somehow just add them for all collections?) Hardcoding would mean exactly one thing needs to be changed if either Dave or Dusty's information changes. Suggest adding the following:

<associatedParty>
<individualName>
<givenName>David</givenName>
<surName>Bloom</surName>
</individualName>
<organizationName>VertNet</organizationName>
<positionName>VertNet Coordinator</positionName>
<electronicMailAddress>[email protected]</electronicMailAddress>
<onlineUrl>http://www.vertnet.org</onlineUrl>
<role>programmer</role>
</associatedParty>
<associatedParty>
<individualName>
<givenName>Dusty</givenName>
<surName>McDonald</surName>
</individualName>
<organizationName>Arctos</organizationName>
<positionName>Arctos Programmer</positionName>
<electronicMailAddress>[email protected]</electronicMailAddress>
<onlineUrl>https://arctosdb.org/</onlineUrl>
<role>programmer</role>
</associatedParty>

Jegelewicz avatar Dec 30 '22 19:12 Jegelewicz

Add any dataset that this replaces

Any time a new version of metadata is published, the metadata it replaces is identified like this:

<dc:replaces>2aa02cf8-c402-412c-9ad1-585e1e185bef/v1.38.xml</dc:replaces>

It seems like adding these identifiers to the collections would make sense and would be just one more other identifier a collection might have.

Jegelewicz avatar Dec 30 '22 19:12 Jegelewicz

It seems like adding these identifiers to the collections would make sense and would be just one more other identifier a collection might have.

That sounds like a good improvement to me, just thinking of an archive/tracking situation. We have a number of collections that have already been published in old formats, and now are coming on as Arctos material. If this helps with that transition I think it would be a good idea.

ewommack avatar Jan 10 '23 16:01 ewommack

Hey Teresa,

An FYI that the ALMNH Taxonomic Coverage in the EML lists 25 taxonomic names and defines them all as "phylum". There are two issues: 1. They all wind up in the same line in the IPT and 2. they are not all phyla, there is at least one class and one genus in there.

I think that's an easy fix if you want things to populate the IPT directly, but it means they need 25 individual lines. I'm sure there are some other collections that do this in Arctos too.

Just another item to add to the list of things to sync with metadata.

d

  1. I don't really see a way to ensure that the taxa selected match the rank - ugh
  2. Write eml to parse comma separated taxon names to individual lines?

Jegelewicz avatar Feb 13 '23 23:02 Jegelewicz

from Dave

Looks like this in the IPT image

This suggests that we need a re-do of this section of manage collection. Instead of on place for rank and one for taxon names, we should allow multiple things just as in the IPT above. This could have th advantage of linking Scientific Name to names in the list of taxa.

Jegelewicz avatar Feb 14 '23 14:02 Jegelewicz

See also https://github.com/ArctosDB/arctos/issues/5691 - collection attributes would provide a ~~dumping ground~~ semi-structured place for WHATEVER, I think including everything mentioned above.

dustymc avatar Sep 05 '23 16:09 dustymc

I don't think there are any action items here, closing.

dustymc avatar Aug 27 '24 00:08 dustymc