EntityLinker knowledge base returns CUIs not MeSH IDs when 'mesh' is selected
I'm using scispaCy entity linker using this snippet:
from scispacy.linking import EntityLinker
import spacy, scispacy
config = {
"resolve_abbreviations": True,
"name": "mesh",
"max_entities_per_mention":1
}
nlp = spacy.load("en_core_sci_sm")
nlp.add_pipe("scispacy_linker", config=config)
linker = nlp.get_pipe("scispacy_linker")
def mesh_extractor(text):
doc = nlp(text)
for e in doc.ents:
if e._.kb_ents:
cui = e._.kb_ents[0][0]
print(e, cui)
text = "Give him three injection of paracetamol"
Then when I use it:
>> mesh_extractor(text)
Give C1947971
injection C0021485
But, in the README of scispaCy, I see that for MeSH, it should not return UMLS CUIs, but the specific MeSH IDs (for example, D003435). How to fix this? Did I understand something badly?
ahh, the config parameter is called linker_name, not name. If you set linker_name instead, it should work.
Thanks a lot!
I am getting the same error eve using linker_name in the configurator:
config = {
"resolve_abbreviations": True,
"linker_name": "mesh",
"max_entities_per_mention":5
}
nlp = spacy.load("en_core_sci_md")
nlp.add_pipe("scispacy_linker", config=config)
linker = nlp.get_pipe("scispacy_linker")
doc = nlp("Pre-diabetes Obesity Type-2 Diabetes Mellitus Obesity Overweight")
for e in doc.ents: if e..kb_ents: cui = e..kb_ents[0][0] print(e, cui)
and I get:
Pre-diabetes C0362046
Obesity C0028754
Diabetes Mellitus C0011849
Obesity C0028754
Overweight C0497406
I also used other Scispacy model: nlp = spacy.load("en_ner_bionlp13cg_md") in the same script, I don't know if it matters
Hi, it looks like the original mesh linker was created with a separate kb, rather than just a subset of UMLS. The process for creating the linker may have been lost. When I recreated the linkers for the latest UMLS release, I just used a subset of UMLS to produce the mesh linker. I'll have to look into this and decide whether to just stick to the current UMLS ids, or try to recreate the old version of the linker. Sorry about that. For now you will need to map between UMLS id and mesh id yourself.
I see, maybe I will try using the previous scispacy version (0.5.1) that should work. Thank you very much for answering my question!
Also facing this problem, but I am able to map to MeSH from UMLS CUIs using the MRCONSO.RRF file