cui2vec icon indicating copy to clipboard operation
cui2vec copied to clipboard

co-occurence matrix

Open KrishnaPG opened this issue 4 years ago • 2 comments

In the section 2.4 of the paper,

a CUI-CUI co-occurrence matrix is constructed, ... For nonclinical text data (e.g., journal articles), it is first preprocessed (see Section 3) and chunked into fixed length windows of 10 words, and a co-occurrence is counted as the appearance of a CUI-CUI pair in the same window. For claims data, ICD-9 codes are mapped to UMLS CUIs and a co-occurrence is counted as the number of patients in which two CUIs appear in any 30-day period. Finally, for the clinical notes, we counted a co-occurrence as two CUIs appearing in the same 30-day ‘bin’

The co-occurance matrix created on these 3 separate sources - would you be able to kindly provide access to it? It is very powerful data-structure and can lead to further investigations (we already hold UMLS license, and if required can reach out to you privately to get the download access, if it cannot be publicly released).

Thank you

KrishnaPG avatar Dec 16 '20 10:12 KrishnaPG

I'd like to know the structure off the co-occurrence matrix, if at all possible.

GregSilverman avatar Feb 12 '22 01:02 GregSilverman

Never mind, I see all data are in the vignettes folder.

GregSilverman avatar Feb 12 '22 21:02 GregSilverman