cui2vec Lookup dictionary for pretrained embedding

Hi Andrew,

Do you have a lookup dictionary for the pretrained embeddings? I saw in the embedding file, the "medical concepts" are in format of "CXXXX", not sure if they are ICD codes, procedure codes or something else.

Thanks!

Oct 09 '20 14:10 victorconan

Hello Victor,

I have been looking into this work recently, I think that CUI mapping files / scripts to convert can be found in the repository for embeddings: https://github.com/clinicalml/embeddings/tree/master/eval

Cheers

Nov 05 '20 17:11 reality

the "medical concepts" are in format of "CXXXX", not sure if they are ICD codes, procedure codes or something else

These are UMLS concept unique identifier(CUI)

Examples from https://arxiv.org/pdf/1804.01486.pdf

Primary condition: premature infant (CUI: C0021294) Comorbidity:
bronchopulmonary dysplasia (CUI: C0006287)

UMLS CUIs can be browsed on https://uts.nlm.nih.gov/metathesaurus.html (N.B. You would need to register yourself first).

Nov 06 '20 04:11 kaushikacharya

Came across this post while looking for information on the meaning of the columns in the cui2vec_pretrained.csv file. The columns are named v1, v2 ... v500. Where can we get information on what do these 500 columns stand for?

If we were to load this csv file into a database, what kind of schema should we create? (Or does it even make sense to load this into a database in the first place?) I have read the https://arxiv.org/pdf/1804.01486.pdf multiple times but could not get any information on the structure of this pretrained csv file. Any help is greatly appreciated.

Dec 13 '20 14:12 KrishnaPG

The columns are named v1, v2 ... v500. Where can we get information on what do these 500 columns stand for?

v1,...,v500 are the 500 dimensional vector embedding for the CUIs.

Quoting the paper from Section 4.1:

The 500-dimensional word2vec style embeddings using the combined data are referred to
as the cui2vec embeddings in all subsequent experiments.

Loading cui2vec: You can use gensim as explained in https://github.com/RaRe-Technologies/gensim-data/issues/25#issuecomment-535042220

As a pre-requisite, you should read about word embeddings e.g. word2vec. That will help you to understand vector embedding of text.

Dec 16 '20 05:12 kaushikacharya

cui2vec cui2vec copied to clipboard

Lookup dictionary for pretrained embedding

cui2vec
cui2vec copied to clipboard