scispacy icon indicating copy to clipboard operation
scispacy copied to clipboard

EntityLinker knowledge base returns CUIs not MeSH IDs when 'mesh' is selected

Open xegulon opened this issue 4 years ago • 6 comments

I'm using scispaCy entity linker using this snippet:

from scispacy.linking import EntityLinker
import spacy, scispacy

config = {
    "resolve_abbreviations": True,  
    "name": "mesh", 
    "max_entities_per_mention":1
}

nlp = spacy.load("en_core_sci_sm")
nlp.add_pipe("scispacy_linker", config=config) 

linker = nlp.get_pipe("scispacy_linker")

def mesh_extractor(text):
    doc = nlp(text)
    for e in doc.ents:
        if e._.kb_ents:
            cui = e._.kb_ents[0][0]
            print(e, cui)

text = "Give him three injection of paracetamol"

​Then when I use it:

>> mesh_extractor(text)
Give C1947971
injection C0021485

But, in the README of scispaCy, I see that for MeSH, it should not return UMLS CUIs, but the specific MeSH IDs (for example, D003435). How to fix this? Did I understand something badly?

xegulon avatar May 17 '21 14:05 xegulon

ahh, the config parameter is called linker_name, not name. If you set linker_name instead, it should work.

dakinggg avatar May 18 '21 19:05 dakinggg

Thanks a lot!

xegulon avatar May 19 '21 08:05 xegulon

I am getting the same error eve using linker_name in the configurator:

config = { "resolve_abbreviations": True,
"linker_name": "mesh", "max_entities_per_mention":5 }

nlp = spacy.load("en_core_sci_md")

nlp.add_pipe("scispacy_linker", config=config)

linker = nlp.get_pipe("scispacy_linker")

doc = nlp("Pre-diabetes Obesity Type-2 Diabetes Mellitus Obesity Overweight")

for e in doc.ents: if e..kb_ents: cui = e..kb_ents[0][0] print(e, cui)

and I get:
Pre-diabetes C0362046 Obesity C0028754 Diabetes Mellitus C0011849 Obesity C0028754 Overweight C0497406

I also used other Scispacy model: nlp = spacy.load("en_ner_bionlp13cg_md") in the same script, I don't know if it matters

Braianpp avatar Jul 04 '23 19:07 Braianpp

Hi, it looks like the original mesh linker was created with a separate kb, rather than just a subset of UMLS. The process for creating the linker may have been lost. When I recreated the linkers for the latest UMLS release, I just used a subset of UMLS to produce the mesh linker. I'll have to look into this and decide whether to just stick to the current UMLS ids, or try to recreate the old version of the linker. Sorry about that. For now you will need to map between UMLS id and mesh id yourself.

dakinggg avatar Jul 05 '23 01:07 dakinggg

I see, maybe I will try using the previous scispacy version (0.5.1) that should work. Thank you very much for answering my question!

Braianpp avatar Jul 05 '23 02:07 Braianpp

Also facing this problem, but I am able to map to MeSH from UMLS CUIs using the MRCONSO.RRF file

JohnGiorgi avatar Jul 25 '23 19:07 JohnGiorgi