sweet icon indicating copy to clipboard operation
sweet copied to clipboard

Global Change Master Directory mappings

Open cmungall opened this issue 4 years ago • 21 comments

See also #23

Are there plans for a mapping to GCMD? https://gcmdservices.gsfc.nasa.gov/kms/concepts/concept_scheme/sciencekeywords

cc @alisonboyer

cmungall avatar Sep 26 '19 23:09 cmungall

This is personally something I would like to see. So far I've been largely unable to really motivate anyone over on the GCMD side to take up reigns. That being said, I let that deter me too easily.

I would be willing to assist.

lewismc avatar Sep 27 '19 02:09 lewismc

In the past, I would not consider doing GCMD mappings because there was no rigor in the terms, and no definitions, so it was very hard (or maybe idiosyncratic) to know what they mean. (Also, for a long time I couldn't get permission to put them in the repository.)

I haven't followed their work for the last 5 years or more though. so it could be a lot better now. And a lot of people use them I am sure. So it would be great to (a) have them in the repository, and (b) have a mapping available. (I wonder if syntactic mappings might be particularly effective in this case?)

graybeal avatar Sep 27 '19 03:09 graybeal

Data versions of the GCMD instruments, platforms and science keywords do exist in COR... they are also served as linked data if you navigate to the base IRI. From what I understand these were added by @tbs1979 some time ago.

Clearly no automated process has been established for linking the above resources (or the GCMD modules more generally) to any other resource such as SWEET.

lewismc avatar Sep 27 '19 04:09 lewismc

I had been thinking about previously and then was sidetracked by other work (well, the things I actually get paid to do). A few months ago I scraped the GCMD rdf files, as well as the NASA Thesaurus, from their respective sites so I might investigate locally.

Thus far, I have stripped the URI and prefLabels (only) for each so I could do a string match/dictionary comparison to get an idea of scale. I realise this is a very low tech and hacky way to address this type of problem, and there are likely better approaches. However, as neither GCMD or SWEET have any real definitions (yet) a syntactic matching may in fact be a valid first cut (as @graybeal already mentioned).

Unfortunately, GCMD has a lot of forward slash labels. As an example, one concept URI has the skos:prefLabel "NASA/GSFC/SED/ESD/LANDSAT/ED". There are many...idiosyncrasies...like this which I have not accounted for yet.

I had a repo with all the RDF files, scripts and CSV files together, but it only just occurred to me that NASA probably don't want all their RDF files sitting in an open repo on github. :) As such, I have removed all of that and put the two previously mentioned CSV files (tab delimited) in an open repo: https://github.com/brandonnodnarb/SWEET-mappings-staging.

If any of you find these useful, have at it.

@graybeal, I had to chuckle when you mentioned GCMD not having definitions...said the pot to the kettle :)

brandonnodnarb avatar Sep 27 '19 15:09 brandonnodnarb

One key question that arises when trying to make sense of GCMD is that quite a few terms appear in more than one place in the tree. Are they the same concept in different contexts?

dr-shorthair avatar Sep 29 '19 02:09 dr-shorthair

Generally yes... many of the cryosphere terms end up in two places - typically cryo and ocean for sea ice related terms.

Sent from my iPhone

On Sep 28, 2019, at 8:58 PM, Simon Cox [email protected] wrote:

One key question that arises when trying to make sense of GCMD is that quite a few terms appear in more than one place in the tree. Are they the same concept in different contexts?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

rduerr avatar Sep 29 '19 05:09 rduerr

that was my thought also…I think there were a very few cases that I would call a 'mistake', in that the same term was used in two different places where it really meant something different. (So, clear syntactic collision that was overlooked, or maybe acceptable since they were in different ontologies.) Those may have all been fixed up by now.

graybeal avatar Sep 29 '19 07:09 graybeal

But the same Short-name+Definition in different places has a different Number, UUID and Path: eg.

Number Short Name UUID Definition Path
1806 ABLATION ad793d5e-b75d-4d3e-a542-ad4b4075b141 The process of removal of material from the surface of an object by vaporization, chipping, or other erosive processes. The term occurs in spaceflight associated with atmospheric reentry, in glaciology, medicine, and passive fire protection. EARTH SCIENCE|LAND SURFACE|GEOMORPHIC LANDFORMS/PROCESSES|GLACIAL PROCESSES|ABLATION
2485 ABLATION  99db4dca-4d07-48fd-8ba3-393532d04aa6 The process of removal of material from the surface of an object by vaporization, chipping, or other erosive processes. The term occurs in spaceflight associated with atmospheric reentry, in glaciology, medicine, and passive fire protection. EARTH SCIENCE|SOLID EARTH|GEOMORPHIC LANDFORMS/PROCESSES|GLACIAL PROCESSES|ABLATION
1382 ABLATION ZONES/ACCUMULATION ZONES 95fbaefd-1afe-4887-a1ba-fc338a8109bb Pertaining to the reduction of a glacier due to melting and/or evaporation. EARTH SCIENCE|CRYOSPHERE|GLACIERS/ICE SHEETS|ABLATION ZONES/ACCUMULATION ZONES
2859 ABLATION ZONES/ACCUMULATION ZONES a994a6f6-cfcd-45d2-95a4-0f8455a9454d Pertaining to the reduction of a glacier due to melting and/or evaporation. EARTH SCIENCE|TERRESTRIAL HYDROSPHERE|GLACIERS/ICE SHEETS|ABLATION ZONES/ACCUMULATION ZONES

dr-shorthair avatar Sep 29 '19 08:09 dr-shorthair

Hi All,

The logic on why some of the GCMD science keywords appear in multiple places within the hierarchy (example 'Sea Ice' under 'Cryosphere' and 'Oceans') is that when users would use the keyword facets in a search interface, they would still find the keyword depending on what disciple path they were going down. In past user search behaviors, Cryospheric scientists might go look for Sea Ice under Cyrosphere and oceanographers might look for Sea Ice under Oceans. We did not want users to "miss" the keyword when doing facet type searching.

I wonder if this logic is becoming obsolete now with the more advanced ontologies and search capabilities, however the GCMD keywords are considered a controlled vocabulary and not necessarily a full ontology.

Your feedback is greatly appreciated.

Thanks,

Tyler Stevens KBR | Senior Discipline Engineer, NASA EED-2

Office: 301-851-8113 | [email protected]

https://mail02.ndc.nasa.gov/owa/redir.aspx?C=V8CgRYg3bQhQJTfp1SZy8qz17zr57afdr3RPnU2Q1JJ5D_SfSAfVCA..&URL=http%3a%2f%2fgcmd.nasa.gov%2f


From: Simon Cox [email protected] Sent: Saturday, September 28, 2019 10:58 PM To: ESIPFed/sweet [email protected] Cc: Stevens, Tyler B. (GSFC-423.0)[Stinger Ghaffarian Technologies] [email protected]; Mention [email protected] Subject: [EXTERNAL] Re: [ESIPFed/sweet] Global Change Master Directory mappings (#159)

One key question that arises when trying to make sense of GCMD is that quite a few terms appear in more than one place in the tree. Are they the same concept in different contexts?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ESIPFed_sweet_issues_159-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAGF2WORUBUTUK7HPQLM5DJLQMAKWPA5CNFSM4I277LRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD73GZGY-23issuecomment-2D536243355&d=DwMCaQ&c=ApwzowJNAKKw3xye91w7BE1XMRKi2LN9kiMk5Csz9Zk&r=ClhE-fOXVws9KIK2m9XESFX-807X65oCtO3rphfxx2E&m=kCbklLki43gl7srbOlPUFj6KQ4h2WEtbxkn5ZfskiFA&s=Y2fOylRkGoxHHEhshVaiOa6BswN8L4eAAfoHR53E6aw&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AGF2WOSWT2VCCKSZ7B7SAL3QMAKWPANCNFSM4I277LRA&d=DwMCaQ&c=ApwzowJNAKKw3xye91w7BE1XMRKi2LN9kiMk5Csz9Zk&r=ClhE-fOXVws9KIK2m9XESFX-807X65oCtO3rphfxx2E&m=kCbklLki43gl7srbOlPUFj6KQ4h2WEtbxkn5ZfskiFA&s=pZeyVjcecFNvkcaqeWJ85Rtjfjly1F4B6WWn828grUo&e=.

tbs1979 avatar Sep 30 '19 12:09 tbs1979

I removed my last two comments (hid the first, deleted the second) as I had a brain fail and started talking about SWEET. So sorry!

graybeal avatar Sep 30 '19 12:09 graybeal

The attached image shows all concepts, and their definitions, with the label "STRATIGRAPHIC SEQUENCE". GCMD_StratigraphicSequence

Fore reference, SWEET currently has 'stratigraphic sequence' defined as a class, a subclass of 'history', without a natural language definition, in phenGeol.ttl.

One option would be to create a skos:related link between the SWEET concept and the GCMD concepts in a separate mapping file. There could also be a custom subproperty of skos:related defined removing the symmetricalness of the original relation --- i.e. SWEET:a skos:related GCMD:a would not entail GCMD:a skos:related SWEET:a as it would with skos:related.

I suppsoe there could also be an rdfs:isDefinedBy relation from the SWEET class to one of the cited definitions, with a skos:related link between the others, but that doesn't seem wise at present.

Thanks for chiming in, @tbs1979. That's good info. You are correct, there doesn't need to be unique instances (unique IDs) of the same concept in order to participate in different hierarchies, facets, or whatever. That's sort of the point, if it's the same thing (by definition) the things should be...the same. :) (this last sentence was supposed to be sarcasm, I hope it comes through that way)

Is there any capacity or will/want at NASA to modify or perhaps re-work GCMD?

brandonnodnarb avatar Sep 30 '19 12:09 brandonnodnarb

Yes thanks for chiming in @tbs1979 We captured information relating to the above in CMRQ-2485. At the time this was feedback from the ESIP 2017 conference. In June '19 this ticket and child tickets were subsequently marked as Deferred so I just assumed that no work was being done here.

@tbs1979 would you be open to having a dedicated GCMD session/session track at the upcoming ESIP Winter meeting to address the issues highlighted above?

lewismc avatar Sep 30 '19 17:09 lewismc

@lewismc In regards to CMRQ-2485, this work has not been set as a high priority by ESDIS, so it may not get worked on for a while. I will relay your interest in this back to ESDIS.

In regards to a GCMD session on keywords, we have had sessions in the past regarding the topics, so I don't know if there is anything new to add until some of the enhancements are made on our end. Let me see what our plans are for the Winter ESIP meeting and interest in mappings to SWEET. I think there would be some benefit there, but need to look at the LOE.

tbs1979 avatar Oct 01 '19 12:10 tbs1979

Thanks @tbs1979

this work has not been set as a high priority by ESDIS

Understood. I think this is because we've not properly communicated that this is important for some ongoing initiatives. If we were to communicate it then it may be escalated.

we have had sessions in the past regarding the topics

Yes and I've attended a few of them. I think this issue concerns a different part of GCMD though. This is a focused effort which aims to achieve something very specific...

lewismc avatar Oct 01 '19 15:10 lewismc

@lewismc and all. Perhaps we can discuss some of your ideas and issues about the GCMD keywords at an upcoming telecon before we bring it to the broader ESIP community? When is your next committee telecon.

tbs1979 avatar Oct 02 '19 12:10 tbs1979

@tbs1979 the next committee meeting is 4th Tuesday in October - 2019-10-22

SemTech Monthly Telecon

    4th Tuesday of each month at 4pm Eastern
    GoToMeeting: https://www.gotomeeting.com/join/976796333
    Phone Access: United States: +1 (872) 240-3212
    Access Code: 976-796-333 

lewismc avatar Oct 02 '19 22:10 lewismc

@lewismc Do you want to a short discussion of the GCMD keywords on the agenda for that meeting?

tbs1979 avatar Oct 03 '19 12:10 tbs1979

@tbs1979 I'm not quite understanding... Do you want to chat before hand? Or do you want to dedicate time at the meeting? Please clarify.

Basically on our end (the GCMD contributor and consumer community) we have been, for some time, providing guidance to you guys (the GCMD developers and maintainers) on how the service would better meet the needs of the community. I think this information flow has at times been ad-hoc and has therefore lost its focus and emphasis. I feel that by you (and the GCMD decision makers) both attending the ESIP SemTech meeting and receiving feedback from the community, you would be in a better position to 1) focus and document the feedback, and 2) get a better idea of what you should perhaps prioritize moving forward.

Really, what we are trying to do is align SWEET with GCMD. Right now, due to how GCMD is structured, it is more difficult than it needs to be. I hope this clarifies.

lewismc avatar Oct 03 '19 17:10 lewismc

@lewismc A few of us can attend the next SemTech Monthly Telecon to get the discussion started if that is ok with you. We can discuss some of the activities and a path forward. Can you send me an invite to the telecon? Thanks.

tbs1979 avatar Oct 04 '19 12:10 tbs1979

Done @tbs1979

lewismc avatar Oct 04 '19 17:10 lewismc

Was there any update on this?

An update from me. As part of the NMDC project we are mapping GCMD to ENVO and other OBOs:

https://github.com/microbiomedata/nmdc-metadata/issues/59

As a start we have made a repo for the mapping pipeline here:

https://github.com/EnvironmentOntology/obo-to-gcmd-mapping

our focus is obo to gcmd, but we'll also align sweet, then we can combine with sweet-obo mappings for mutual consistency...

cmungall avatar Apr 27 '20 19:04 cmungall