sssom icon indicating copy to clipboard operation
sssom copied to clipboard

Relationship to PROV-O

Open jamesamcl opened this issue 5 years ago • 11 comments

Not for the first version, but it would be interesting to think about how SSSOM relates to PROV-O. e.g. would mappings have prov:wasGeneratedBy properties? could ontology alignment be a prov:Activity?

jamesamcl avatar May 13 '20 14:05 jamesamcl

Yes, looking at how to best capture provenance is important. In the Open PHACTS project, we used VoID for that. You really want to capture more detail like who captures this, using what, from what, when, and using which versions. Typically that will be layered. Like these mappings were created based on the mappings available at that resource, and the resource itself should indicate how they created those.

Chris-Evelo avatar Aug 11 '20 08:08 Chris-Evelo

Some examples from slack convo with @cmungall mymappings was-generated-by [a CurationActivity; timestarted...]

matentzn avatar Jan 15 '21 19:01 matentzn

Would someone be able to work on this? I think concretely, it would be nice to at least document how the two relate, perhaps even implement them in sssom-py

matentzn avatar May 31 '21 17:05 matentzn

I have to review PROV-O from a process ontology development angle. But to be of help with respect to SSSOM, I'd need to read a 2 pager on what SSSOM goals/capabilities are, down to the entity type - is there such at thing? Is the question what OBOFoundry entities map to PROV-O, or what data structure PROV-O has that SSSOM needs? [Edit: I see its the latter that you are after; ... now I found link to SSSOM.md]

ddooley avatar Jun 03 '21 17:06 ddooley

pls assign (at least one person anyway) to @mbrush

mellybelly avatar Jun 03 '21 17:06 mellybelly

I'll note that my PROV-O review is limited to process related terms, but SOSA is also reviewed in the same light, both with respect to OBO in this table. It reveals mapping cases that it sounds like SSSOM strives to describe.

ddooley avatar Jul 21 '21 15:07 ddooley

The task for the workshop is to determine which properties and best practices from PROV should be adopted in SSSOM, and which SSSOM metadata elements should be mapped to prov: metadata elements. Overall, we need to ensure that we can model mapping provenance in a standard, reliable (yet minimal) way, and we have not missed anything important.

matentzn avatar Sep 01 '21 14:09 matentzn

Proposal below. The essence is to allow granularity where people need it, and to keep the sssom tsv simple, with the most useful parts of the prov payload denormalized as sssom fields.

  • Add an optional column was_generated_by, with slot_uri pointing to the prov property
  • The range of this would be a CURIE denoting a (named) instance of a prov activity
  • The CURIE prefix MUST be registered in the header
  • the expanded URL SHOULD resolve, and MAY resolve to both computable and human readable info via conneg
  • the prov activity MAY be included as YAML-LD in the header
  • If a prov activity is provided, it must be consistent with the projected sssom fields
    • mapping_tool = activity.wasAssociatedWith, range SoftwareAgent
    • creator_id = activity.wasAssociatedWith, range Person
    • subj,obj source = activity.used
    • the prov object MAY include wasDerivedFrom between subj/obj sources and the sssom mapping set
    • etc

cmungall avatar Sep 02 '21 18:09 cmungall

Hi all. I don't have a deep understanding of the requirements/use cases related to provenance for SSSOM, but Chris' proposal above sounds reasonable. Cant make the workshop, but happy to follow up later and review any concrete examples/proposals that come out of it.

mbrush avatar Sep 02 '21 20:09 mbrush

Other impressions from workshop:

  • Look at vocabs like https://github.com/sifrproject/MOD-Ontology/blob/master/mod-v1.4_properties_template.ttl for properties commonly used
  • There are perhaps multiple activities involved here:
    • mapping creation
    • mapping review
    • mapping reconciliation (activity on mapping set level)
  • Define Inputs and outputs of activities?
  • (Will add more in a bit)

matentzn avatar Sep 08 '21 14:09 matentzn

from @graybeal

  • Check 20-30 provenance related properties that could be relevant: list
    • list of most-recommended terms as a template: https://github.com/sifrproject/MOD-Ontology/blob/master/mod-v1.4_properties_template.ttl

matentzn avatar Sep 24 '21 10:09 matentzn

This is now described in our recent OM 2022 report. The gist: the mapping_justification is a PROV:Activity that confirms/generates the Mapping. Agents acting on Mappings include people (author_id) and tools (mapping_tool).

matentzn avatar Sep 28 '22 15:09 matentzn