sssom icon indicating copy to clipboard operation
sssom copied to clipboard

Several questions: Mapping non-standardized data models

Open joeflack4 opened this issue 1 year ago • 26 comments

Overview

@matentzn We went over some of this this morning, regarding SSSOM::FHIR_ConceptMap and the LinkML_Enum::FHIR_ValueSet work I'm doing. @stephanieshong is doing her own mapping work, e.g. between PCORnet and OMOP, and would like to use SSSOM if it is advisable.

Questions

  1. What if the data model has no canonical URI?
  2. What if the data model has a canonical URI, but no resolvable URIs for concepts?
  3. What if only concept labels exists, not concept IDs?
  4. What if a data model uses a value set that is defined elsewhere, such as PCORnet using a US CDC recommended gender value set?

Working minimal example

SSSOM non-standard data model mapping.tsv.zip

# curie_map:
#   PCORNET: 'url/to/pcornet'
#   OMOP: 'url/to/omop'
#   fhir-gender: 'http://hl7.org/fhir/administrative-gender'
#   skos: http://www.w3.org/2004/02/skos/core#
subject_id subject_label predicate_id object_id object_label
OMOP:demographics.sex.m Male skos:exactMatch OMOP:MALE Male
OMOP:demographics.sex.m Male skos:exactMatch fhir-gender:male Male
fhir-gender:male Male skos:exactMatch OMOP:??? Male

As can be seen above by OMOP:demographics.sex.m, we were thinking of using a dot-notation. The part on the very right would be the code, and everything before that would be the path. But it is unclear: Sometimes the path could be <table>.<field>, as in SQL, or it could represent a JSON or XML path. But I think that there is enough variety in data models that the meaning of the path defined by the dot-notation could be very ambiguous.

joeflack4 avatar Aug 17 '22 20:08 joeflack4

Thanks @joeflack4 @matentzn - More simply, I was wondering how to correctly specify gender mapping information using SSSOM, FHIR as source and OMOP as target. We have many additional valuesets mapping information that we can contribute from N3C. Would like to clarify/ add additional details on the issues mentioned above. In PCORnet gender value is specified in the column in a table and following can be used as the defining url: https://phinvads.cdc.gov/vads/ViewValueSet.action?id=06D34BBC-617F-DD11-B38D-00188B398520

i.e. PCORnet:demographics.sex.M PCORnet:demographics.sex.F PCORnet gender values are specified in : https://phinvads.cdc.gov/vads/ViewValueSet.action?id=06D34BBC-617F-DD11-B38D-00188B398520 gender value sets in FHIR are specified in defining url: [http://hl7.org/fhir/administrative-gender] or in (https://build.fhir.org/codesystem-administrative-gender.html) and in OMOP male is defined in https://athena.ohdsi.org/search-terms/terms/8507 female is defined in https://athena.ohdsi.org/search-terms/terms/8532

Would like a clarification as to what values to use to specify the following columns in the tsv file:

  • subject_id
  • subject_label
  • predicate_id
  • object_id
  • object_label
  • confidence
  • comment mapping_justification
  • mapping_date
  • author_id
  • subject_source_version
  • object_source_version

and the following values for the yaml file

  • mapping_set_id:
  • license:
  • mapping_set_version: "2022-08-17"
  • mapping_set_description: "Manually curated alignment of FHIR gender to OMOP gender concept_id. Intended to be used for ontological analysis and grouping of concept mapping in N3C and in Vulcan."
  • object_source:
  • subject_source:
  • curie_map:

stephanieshong avatar Aug 17 '22 23:08 stephanieshong

Alright sounds interesting! I am happy to spend some time on your questions during a call this or next week, but I would like to invite you, before we move to your specific questions, to complete:

  • https://mapping-commons.github.io/sssom/tutorial/ and
  • https://mapping-commons.github.io/sssom/mapping-predicates/

To get a better sense first. Is that ok?

@joeflack4 I think . notation is fine for this, but feel free to run specific mapping sets by me!

matentzn avatar Aug 18 '22 08:08 matentzn

@matentzn - ok thank you. Looks like next week may be a better option.

stephanieshong avatar Aug 18 '22 14:08 stephanieshong

@hlehmann17 @'ing you here if you wanted to share any thoughts regarding expressing intention for ValueSets in regards to mappings. You mentioned wanting to do this in a computational / data model way. I thought I remember seeing more from FHIR in regards to this UsageContext type, but not a lot on that page. If I find more, I'll share.

joeflack4 avatar Aug 19 '22 16:08 joeflack4

valuesets - set of permissible values for a table column of a CDM (usually an enumerated list of values)
valueset mapping - source CDM valueset mappings to target (i.e. target can be FHIR or another CDM) codesets - set of related concepts that describes a condition/ disease/ medication/ lab results Note, codesets can contain translated code/codesystem values and the "maps to" code/codesystem values, e.g. both src code and translated target code

In FHIR, the conceptmap holds the mapping information: i.e. A statement of relationships from one set of concepts to one or more other (target) concepts - either concepts in code systems, or data element/data element concepts, or classes in class models.

In N3C we have both the valueset mapping and the code/codesystem mapping, we would like to contribute that knowledge in SSSOM mapping format. Hope this clarifies.

stephanieshong avatar Aug 19 '22 16:08 stephanieshong

Re "intention": Siggie may disagree that anyone reuses concept sets, but if they were to do so, they would use a "intention" description to screen for a relevant concept set. (I.e., they would search on name, but then look at the intention to decide whether to focus on one concept set or the other, and then look in on the concept expression or concept codes and overlap). At least, this is my presumption. But if I am correct that concept sets should be disambiguated by intention, and not just by the concept expression or the extensional list of codes, then attention should be paid to them. I am disheartened that there is little metadata in Atlas concept sets. VSAC has good meta data. so first, some analysis is needed to look through lists of intention texts. If there is some regularity, then a structure could be made for it that would be tucked inside the FHIR valueset, probably in the "purpose" element, or perhaps "scope". With structure, a computer tool could help a user find the right valueset for their question, assuming the question could be posed in a "language" that would cohere with the "intention" language. That's my thought, at any rate. If there is an effort to map between value sets, I am thinking that thinking about intention, in a computational way, is important.

hlehmann17 avatar Aug 19 '22 20:08 hlehmann17

@hlehmann17 thank you for the analysis! Could you give a few examples on "intention" that would illustrate the kind of granularity we are talking about?

@stephanieshong Thank you for providing your definitions. Some clarifying questions:

  • can valuesets be considered "codesets" of all individual values are mapped to standard codes from a code system, or would you not say that?
  • When talking about valueset mapping, is it correct to say that we need to levels of mappings:
    • One value set A maps to another value set B
    • Every code in value A maps to a code in value set B, and vice versa
    • There cannot be any value in value set A that is not mapped to some value in value set B, and vice versa

matentzn avatar Aug 20 '22 07:08 matentzn

Sure:

Low granularity: "Broad (sensitive")

High: "Create broad and generic general use concept as starting place for research teams and for use in calculating comorbidity indices. Inclusive thalassemia diagnoses that are clinically relevant to the proband patient." (For a concept set of Thalassemia)

You'll see here both semantic issues ("clinically relevant") and pragmatic ("for use...."), using linguistic categories.

Note that the semantic issues are not discoverable in the vocabularies (ICD10 and SNOMED) being mapped, but could, I supposed, be "discovered" in a 3rd ontology, say, OMIM.

hlehmann17 avatar Aug 21 '22 01:08 hlehmann17

Here's another, for Peripheral Vascular Disease: "Create broad and generic general use concept as starting place for research teams and for use in calculating comorbidity indices. Focus is on diseases intrinsic to blood vessels. Follows the intention of the Charlson comorbidity index: Includes conditions due to atherosclerosis, stenosis, occlusion and diabetes; excludes conditions due to embolism or congenital conditions or limited to the skin."

hlehmann17 avatar Aug 21 '22 01:08 hlehmann17

Your examples pertain mostly to the intentions of the value sets themselves, am I right? Is it possible, according to your view on value set mappings, that the mapping itself has a specific intention? So, could there be a case for mapping one value set with high granularity to another with low granularity, and would this imply something about the intention of the mapping?

matentzn avatar Aug 22 '22 10:08 matentzn

Yikes! Getting pretty meta!

Certainly we’ve seen intentions of mappings:

  1. Desire for absolute coherence between the codes in the two sets (extensional coherence) (probably could be called a “specific” mapping)
  2. Desire for coherence between the intentions in the two sets (intentional coherence)
  3. Desire for pragmatic coherence (High-to-low granularity) (may also be a “sensitive” mapping)

Again, I am thinking that some empirical work is needed to ensure that this little typology is complete.

Harold

From: Nico Matentzoglu @.> Date: Monday, August 22, 2022 at 6:58 AM To: mapping-commons/sssom @.> Cc: Harold Lehmann @.>, Mention @.> Subject: Re: [mapping-commons/sssom] Several questions: Mapping non-standardized data models (Issue #221)

Your examples pertain mostly to the intentions of the value sets themselves, am I right? Is it possible, according to your view on value set mappings, that the mapping itself has a specific intention? So, could there be a case for mapping one value set with high granularity to another with low granularity, and would this imply something about the intention of the mapping?

— Reply to this email directly, view it on GitHubhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmapping-commons%2Fsssom%2Fissues%2F221%23issuecomment-1222190406&data=05%7C01%7Clehmann%40jhmi.edu%7Cee9c0613f2034db0abaf08da842d4a21%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637967627306192853%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=226eJvFPndQT4zhpQYJ47XDADr1P8s9%2BsmAdCDv5spI%3D&reserved=0, or unsubscribehttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAEBMMJT2PHIURYWZP72VSYLV2NMONANCNFSM5622RECA&data=05%7C01%7Clehmann%40jhmi.edu%7Cee9c0613f2034db0abaf08da842d4a21%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637967627306192853%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=TJTfoVNOOjHkNgdL9ODVMYU%2FFBcksg0Qlb7gAtm8lkg%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

hlehmann17 avatar Aug 22 '22 14:08 hlehmann17

I wonder if we can concentrate on value sets first before diving onto the codes and code systems mapping. I have a deadline to validate / check if SSSOM can be used to hold the values set mappings in PCORnet OMOP and FHIR by next week.

stephanieshong avatar Aug 22 '22 18:08 stephanieshong

I have some meta observations about this discussion, derived in part from my struggles to understand it.

The original ticket was about mapping non-standardized data models into SSSOM, and my first thought is that making this an SSSOM problem may not be a good thing (because it makes SSSOM itself less computable in standardized semantic space—I can't count on SSSOM being meaningful with respect to W3C semantic artifacts any more, with all the benefits that conveys). At any rate, this seemed to me a mechanical question.

But then @joeflack4 commented:

… if you wanted to share any thoughts regarding expressing intention for ValueSets in regards to mappings.

And at that point the discussion went in what seemed to me a totally different direction, one mostly about intentions, and one that seems appropriate to me for a different ticket, or even two (one about expressing intentions for mappings in SSSOM, and a broader one about expressing intentions of a semantic asset in a computational way).

Whether or not that happens, I think these terms—valuesets, valueset mapping, and codesets—are being discussed as if the definitions here are the only ones that we have to worry about. (So we might reach a definitive conclusion that this is how SSSOM should represent valueset mapping, or codeset mapping.) Whereas I believe the terms are often defined very locally to the systems in which they are used, and that a question like "can valuesets be considered "codesets" of all individual values are mapped to standard codes from a code system, or would you not say that?" is not a meaningful one for SSSOM to answer, because we are not experts in the use of valuesets and codesets in the original domains.

So if I were to take the original question and example:

As can be seen above by OMOP:demographics.sex.m, we were thinking of using a dot-notation. The part on the very right would be the code, and everything before that would be the path. But it is unclear: Sometimes the path could be <table>.<field>, as in SQL, or it could represent a JSON or XML path. But I think that there is enough variety in data models that the meaning of the path defined by the dot-notation could be very ambiguous.

I would say establishing a generic mapping language to get from alternative semantic models, to a semantic model that is compatible with RDF triples, is a tricky business indeed, and one prone to ambiguity. And maybe it should be handled not in the SSMOC context, but in a much more complete semantic context, and by the specific programs that have the need and the understanding of the models they want a solution for. Then those program(s) can pick the translation that makes sense for their circumstances, and that might let them migrate to a more W3C-like semantic representation model someday. In my experience this translation task is often a mechanical task, as the example suggests, but is only as successful as the original model—and its semantic rigor—allows.

graybeal avatar Aug 23 '22 17:08 graybeal

@graybeal My thoughts:

my first thought is that making this an SSSOM problem may not be a good thing

I'm beginning to think that you may be right. I think the problem here is that the team leads of the participants in this thread, including some participants (myself too) are very bullish on mapping standardization, and have been trying to stretch SSSOM from it's declared scope (W3C semantic artifacts, e.g. RDF, OWL) to handle broader problems.

Perhaps it would be good, if no other existing standard exists, to create a superset for arbitrary data model mapping standard (SSSMM? MM=Model Mapping)i n tabular format, with a lot of the fields that SSSOM has. I'm not seeing anything come up on a Google search for "data standard for data model mapping". My programmer spidey sense is ringing, though. Maybe I'm not querying using the right words.

(because it makes SSSOM itself less computable in standardized semantic space—I can't count on SSSOM being meaningful with respect to W3C semantic artifacts any more,

If SSSOM's scope broadens without introducing breaking changes, wouldn't this have no effect on computability in the core semantic scope?

a totally different direction, one mostly about intentions, and one that seems appropriate to me for a different ticket I think you're right, and that's my bad. Will likely open a separate ticket for this.

we are not experts in the use of valuesets and codesets in the original domains

I think you're right. I've heard from various people on their definitions of "value set", "code set", "concept set", and "enumeration". I have seen good ideas, but not a consensus. Not that we can't make progress here, but with the lack of official dictionary or wikipedia definitions, this is a difficult, semantically blurry space.

SSMOC

What does that stand for?

joeflack4 avatar Aug 23 '22 22:08 joeflack4

Sorry for breaching the scope of this ticket, I just found the comment from @hlehmann17 quite interesting and wanted to follow up. Indeed, I moved the intention discussion here: https://github.com/mapping-commons/sssom/discussions/223.

Part of the process we have to do as a community is to delineate the scope of sssom clearly. I think @graybeal and @joeflack4 concerns are all valid and mimic my own - I do, as @joeflack4 however think that it is part of our mandate to at least explore how the exact requirements of this space, and then provide a profile or something similar that would allow us to properly service that community. We are certainly not bending SSSOM to squeeze value set mappings in, but if we can extend it to accommodate them, we should at least explore it.

matentzn avatar Aug 24 '22 08:08 matentzn

Perhaps if we can limit this issue to valuesets mapping for this particular issue, it might help. When I say valuesets I mean enumerated list of permissible values to a object property or common data elements. Most of the time the enumerated list is less than 10 and not in 100s or 1000s like in codesets. I already have mappings between CDM to CDM. They are stored in CSV format. This CSV format is lossy because it does not have a place to hold many meta data information. Therefore, I was more interested in knowing if I can use SSSOM to specify mapping between FHIR valuesets to CDM valuesets, which will allow a space to hold the meta data information like version and any other meta information that we may want to persist with the mapping. Further it may explain how to define mapping going from object model to tabular model, i.e. FHIR(JSON) to CDM(CSV).

So, I wonder if we could explore and see if we can use SSSOM to persist the mapping information between FHIR and CDM valuesets. And as an example, if we can pick gender valueset mapping to see how we might go about doing this? It would be helpful for me at least as a validation that I can use SSSOM.

Can SSSOM accommodate valuesets mapping between FHIR and CDM?

stephanieshong avatar Aug 24 '22 12:08 stephanieshong

@matentzn my answers below:

can valuesets be considered "codesets" of all individual values are mapped to standard codes from a code system, or would you not say that?

When I say valuesets I mean enumerated list of permissible values for an object property or common data elements. Most of the time the enumerated list is rather short.

When talking about valueset mapping, is it correct to say that we need to levels of mappings:

It is one to one mapping and not one to many, so every code in value A maps to a code in value set B, exact mapping 1 to 1

Did I answer your question?

stephanieshong avatar Aug 24 '22 13:08 stephanieshong

Notes from 2022/08/24 meeting

Sets (x):

  1. Gender
  2. Race
  3. Ethnicity

Specific value set mapping instances Stephanie is tasked to do:

  1. PCORnet' x (defined by US CDC gender value set) to OMOP x
  2. FHIR~ x to OMOP x

Nico: The URIs in Stephanie's use case don't need to resolve.

joeflack4 avatar Aug 24 '22 13:08 joeflack4

@joeflack4 - the mapping is fhir to OMOP and fhir to PCORnet, not cdm to cdm ( we already have the mapping going from pcornet to omop)

stephanieshong avatar Aug 24 '22 14:08 stephanieshong

All these clarifications help considerably to make the question more concrete. It seems to boil down to 2 points:

(1) Can SSSOM specify a mapping (in this example between FHIR valuesets and CDM valuesets), while holding additional metadata about the mapping (like version and anything else to be preserved). (2) How can we define a concept in SSSOM in (a) an object model (ex: FHIR/JSON) and (b) a tabular model (ex: CDM/CSV).

Does that sum it up?

graybeal avatar Aug 24 '22 16:08 graybeal

yes, pretty much. For your item 2 above, I would add 2) Can we define both valuesets and concept mapping in SSSOM?

stephanieshong avatar Aug 24 '22 16:08 stephanieshong

https://github.com/mapping-commons/sssom/discussions/223

404?

chrisroederucdenver avatar Sep 06 '22 21:09 chrisroederucdenver

  • Every code in value A maps to a code in value set B, and vice versa
  • There cannot be any value in value set A that is not mapped to some value in value set B, and vice versa one-to-one and onto

Is this required by SSSOM?

chrisroederucdenver avatar Sep 06 '22 23:09 chrisroederucdenver

Hi @chrisroederucdenver !)

No, not a requirement of SSSOM at all. It derived from the particular needs of the use case that started this thread, if I recall correctly.

graybeal avatar Sep 06 '22 23:09 graybeal

Hi @graybeal ! Nice to see a familiar face! As you, I'm trying to make sense of it. I'm throwing together some examples of mapping sex attributes, and they are not all onto! I see where I mis-read: Nico was asking. Thanks.

chrisroederucdenver avatar Sep 07 '22 00:09 chrisroederucdenver

https://github.com/mapping-commons/sssom/discussions/223 404?

Moved to https://github.com/mapping-commons/sssom/issues/224

matentzn avatar Sep 08 '22 10:09 matentzn