QuickUMLS icon indicating copy to clipboard operation
QuickUMLS copied to clipboard

about `nlp.add_pipe` in the demo

Open newbietuan opened this issue 3 years ago • 6 comments

Describe the bug 1638879733 when i run the demo code,there something wrong about “nlp.add_pipe(quickumls_component)”,

Traceback (most recent call last): File "umlsdemo.py", line 8, in nlp.add_pipe(quickumls_component) File "/home/mayt/anaconda3/envs/umls/lib/python3.7/site-packages/spacy/language.py", line 769, in add_pipe raise ValueError(err) ValueError: [E966] nlp.add_pipe now takes the string name of the registered component factory, not a callable component. Expected string, but got <quickumls.spacy_component.SpacyQuickUMLS object at 0x7f24b35e5cd0> (name: 'None').

  • If you created your component with nlp.create_pipe('name'): remove nlp.create_pipe and call nlp.add_pipe('name') instead.

  • If you passed in a component like TextCategorizer(): call nlp.add_pipe with the string name instead, e.g. nlp.add_pipe('textcat').

  • If you're using a custom component: Add the decorator @Language.component (for function components) or @Language.factory (for class components / factories) to your custom component and assign it a name, e.g. @Language.component('your_name'). You can then run nlp.add_pipe('your_name') to add it to the pipeline.

To Reproduce

**Environment **

  • OS: [Unbunt]
  • QuickUMLS version 1.4.0 post1
  • UMLS version 2021AB
  • spacy 3.2.0

Additional context it seems relate to spacy accroding to https://stackoverflow.com/questions/67906945/valueerror-nlp-add-pipe-now-takes-the-string-name-of-the-registered-component-f while i still don't konw how to modify the code~~

newbietuan avatar Dec 07 '21 12:12 newbietuan

It can be used like this.

import spacy
from spacy.language import Language
from quickumls.spacy_component import SpacyQuickUMLS

@Language.component('quickumls_component')
def quickumls_component(doc):
    return SpacyQuickUMLS(nlp, <Path to quickUmls install dir>)(doc)
    

nlp.add_pipe('quickumls_component', last=True)

doc = nlp(full_rpts.iloc[0])

ygivenx avatar Feb 18 '22 18:02 ygivenx

Hi everyone, When I using this code I got the this error [[E090] Extension 'similarity' already exists on Span. To overwrite the existing extension, set force=TrueonSpan.set_extension.]

shrimonmuke0202 avatar Feb 08 '23 13:02 shrimonmuke0202

@shrimonmuke0202 did you solve this problem?? [[E090] Extension 'similarity' already exists on Span. To overwrite the existing extension, set force=TrueonSpan.set_extension.]

ghost avatar Feb 22 '23 20:02 ghost

It can be used like this.

import spacy
from spacy.language import Language
from quickumls.spacy_component import SpacyQuickUMLS

@Language.component('quickumls_component')
def quickumls_component(doc):
    return SpacyQuickUMLS(nlp, <Path to quickUmls install dir>)(doc)
    

nlp.add_pipe('quickumls_component', last=True)

doc = nlp(full_rpts.iloc[0])

Hi there, thank you so much for sharing a solution! I was able to get past the add_pipe error but not further. Could you explain what the line of code on doc = nlp(full_rpts.iloc[0]) does? I was trying to put into something like doc = nlp('Pt c/o shortness of breath, chest pain, nausea, vomiting, diarrrhea') but that does not work. Initially I tried copy pasting your code entirely, but it returns the error saying "full_rpts" is not defined - is there some missing context here about this line of code? Thank you so much!

ysu1213 avatar Mar 15 '23 02:03 ysu1213

It can be used like this.

import spacy
from spacy.language import Language
from quickumls.spacy_component import SpacyQuickUMLS

@Language.component('quickumls_component')
def quickumls_component(doc):
    return SpacyQuickUMLS(nlp, <Path to quickUmls install dir>)(doc)
    

nlp.add_pipe('quickumls_component', last=True)

doc = nlp(full_rpts.iloc[0])

Hi there, thank you so much for sharing a solution! I was able to get past the add_pipe error but not further. Could you explain what the line of code on doc = nlp(full_rpts.iloc[0]) does? I was trying to put into something like doc = nlp('Pt c/o shortness of breath, chest pain, nausea, vomiting, diarrrhea') but that does not work. Initially I tried copy pasting your code entirely, but it returns the error saying "full_rpts" is not defined - is there some missing context here about this line of code? Thank you so much!

full_rpts.iloc[0] returns a string from pandas dataframe, so doc = nlp('Pt c/o shortness of breath, chest pain, nausea, vomiting, diarrrhea') is correct. Did you update the UMLS install location in the code below?

def quickumls_component(doc):
    return SpacyQuickUMLS(nlp, <Path to quickUmls install dir>)(doc)

ygivenx avatar Mar 16 '23 02:03 ygivenx

It can be used like this.

import spacy
from spacy.language import Language
from quickumls.spacy_component import SpacyQuickUMLS

@Language.component('quickumls_component')
def quickumls_component(doc):
    return SpacyQuickUMLS(nlp, <Path to quickUmls install dir>)(doc)
    

nlp.add_pipe('quickumls_component', last=True)

doc = nlp(full_rpts.iloc[0])

Is the Path to quickUmls install dir supposed to be the same as quickumls_fp in this code block?

matcher = QuickUMLS(quickumls_fp, ...)

If so, I am doing this yet get this message:

Loading QuickUMLS resources from a default SAMPLE of UMLS data from here: /opt/conda/envs/python38/lib/python3.8/site-packages/resources/quickumls/QuickUMLS_SAMPLE_lowercase_POSIX_unqlite

and no output from the print statements from the code in OP's block

However, this works fine

# Initialize QuickUMLS matcher
matcher = QuickUMLS("./libraries/quickumls", "score", 0.99)
       
def quick_UMLS_match(medical_text):
    if len(medical_text) > 1000000:
        processed_text = medical_text[:1000000]
    else:
        processed_text = medical_text
    return matcher.match(processed_text, best_match=True, ignore_syntax=False)

But I am trying to implement medspacy as I extract items from the QuickUMLS output in a super inneficient way and this seems like the proper way. For what it's worth, this is how I do it:

def quick_UMLS_extractor(matcher_output, return_field, unique=True):
    return_items = [entity[return_field] for sublst in matcher_output for entity in sublst]

    if unique:
        return_items = list(set(return_items))
        return return_items
    else:
        return return_items

gah-bo avatar Apr 23 '23 19:04 gah-bo