stanza
stanza copied to clipboard
How to implement negation to entities and link disease to UMLS?
Hi folks,
I am just new to Sandfordnlp and found it better compared to other methods. I would like to know how to implement negations to entities as like negspacy available for spacy.
Further, I am working on biomedical NER extraction. Does Standford NLP provide option to link the entities (disease, treatment) to either UMLS or SNOMED codes?
Answers to your two questions:
-
Negation: The current version of Stanza does not support negation detection out of the box. However, if you find another library that can do this given input text, one way to integrate it with the Stanza pipeline is via writing a customized processor. This is made extremely easy in Stanza starting from v1.1. Essentially you just need to implement a processor called
negation, which runs afterner, and contains either your implementation or some wrapper code of another library. Check out our documentation for customized processors for more details. -
Linking: Again, Stanza does not have out of the box support now. But you can easily integrate another library's linking results with the Stanza pipeline via a customized processor.
@yuhaozhang Yes, it helped me.
I have medical text which has few grammatical mistakes. In this case, the stanza is not giving me accurate entities.
I have text as fever hypertension diabetes mellitus
when I extract entities using stanza, it output them as one entity like
fever hypertension diabetes mellitus 'PROBLEM'
which suppose to be three different in the medical context and entities.
fever 'PROBLEM'
hypertension 'PROBLEM'
diabetes mellitus 'PROBLEM'
Please suggest me how to overcome this issue.
I cannot think of a magic fix to this issue. The entity tagger is statistical in nature, and it is hard to directly modify its behavior without retraining it with better data. As you said, your text has some grammatical mistakes (and also missing punctuations from your provided examples), and the NER tagger was trained on cleaner text, therefore likely a performance degradation on your text.
One quick fix I can think of, which requires some lexicon, is to implement a NER "post-fix" processor, which takes in the NER tagger output span, and tries to further split it into several different entities (as shown in your example). You can do this via a customized processor, but you'll need to implement some rules or lexicons for your specific domain and corpus.
"Linking: Again, Stanza does not have out of the box support now. But you can easily integrate another library's linking results with the Stanza pipeline via a customized processor."
Care to explain this a bit further? Thanks!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed due to inactivity.