OMOP2OBO Improve string delimiter detection in mapping pipline

Improve string delimiter detection in mapping pipline

Open callahantiff opened this issue 3 years ago • 1 comments

Describe the Bug

An assumption is made that all concept synonyms and ancestor information will be input in an aggregated format with each aggregated concept separated by a | delimiter. That's a brittle assumption that should be improved. Examples of specs for input data can be found here: resources/clinical_data/README.md

EXAMPLE:
Input Data
The CONCEPT_SYNONYM column below displays data in the expected input format

CONCEPT_ID	CONCEPT_SOURCE_CODE	CONCEPT_LABEL	CONCEPT_SOURCE_LABEL	CONCEPT_SYNONYM
37018594	snomed:80251000119104	Complement level below reference range	Complement level below reference range	Complement level below reference range \| Complement level below reference range (finding)

Example of Data that Breaks Assumptions:
The CONCEPT_SYNONYM column below displays data in an unexpected input format (i.e. two types of delimiters | and ;)

CONCEPT_ID	CONCEPT_SOURCE_CODE	CONCEPT_LABEL	CONCEPT_SYNONYM
40771573	loinc:69052-9	Flow cytometry specialist review of results	Flow cytometry specialist review of results \| Flow cytometry specialist review \| Dynamic; Impression; Impression/interpretation of study; Impressions; Interp; Interpretation; Misc; Miscellaneous; Narrative; Other; Point in time; Random; Report; To be specified in another part of the message; Unspecified

Impact Level

LOW - the string similarity mapping pipeline correctly handles all types of pipings allowing the recovery of missed mappings in the exact match part of the pipeline.

Impacted Scripts

omop2obo/clinical_concept_annotator.py

Solution

[ ] Add a parameter to pass delimiter type
[ ] Improve tests to better vette

Oct 09 '20 16:10 callahantiff

[x] Temp work around provided for release v1.0, which handles weird LOINC synonym strings in the SQL query

Oct 22 '20 19:10 callahantiff

OMOP2OBO OMOP2OBO copied to clipboard

Improve string delimiter detection in mapping pipline

Describe the Bug

Impact Level

Impacted Scripts

Solution

OMOP2OBO
OMOP2OBO copied to clipboard