medspacy icon indicating copy to clipboard operation
medspacy copied to clipboard

Unable to serialize medspacy Doc object that contains Sections

Open RonWilkinson opened this issue 4 years ago • 1 comments

Trying to serialize medspacy documents using the SpaCy doc.to_bytes() or doc_bin.add() methods fails when the doc contains Sections. The error message is:

TypeError: can not serialize 'Section' object

My understanding is that one can write custom serialization/deserialization methods for a class that SpaCy will automatically invoke. I would imagine that this could be done for the Section object.

RonWilkinson avatar Apr 13 '22 21:04 RonWilkinson

I'm running into a similar issue with medspacy 0.2.0.0.

For example, trying to serialize the default model after adding a target rule, I return the error AttributeError: 'PyRuSHSentencizer' object has no attribute 'punct_chars':

import medspacy
from medspacy.ner import TargetRule

nlp = medspacy.load()

target_matcher = nlp.get_pipe("medspacy_target_matcher")
    target_rules = [
        TargetRule("fever, unspecified", "PROBLEM")
    ]
    target_matcher.add(target_rules)
nlp.to_disk("path/to/save")

kelshmo avatar May 30 '22 03:05 kelshmo

Hi all,

I am facing similar issue . Issue Link Can you please let me know if there is a way to use "to_bytes()" or "to_disk()" using medpacy model?

Regards, Hari

hariprakashamk avatar Dec 19 '22 10:12 hariprakashamk

Hi everyone, I'll have to check with the team to check on the latest with the model serialization but will raise the issue today. In the meantime, could you exclude the sentencizer and add it back in when you reinstantiate the model? Do the other components serialize correctly?

abchapman93 avatar Dec 19 '22 15:12 abchapman93

hi sir, I am just trying

import medsacy 
nlp = medspacy.load() 

and trying

nlp.to_disk("path/to/save")

I am not able to save the model in local disk .. Please help !

hariprakashamk avatar Dec 19 '22 15:12 hariprakashamk

Hi @hariprakashamk , Could you try running:

# Remove PyRuSH and save to disk
nlp.remove_pipe("medspacy_pyrush")
nlp.to_disk("path/to/save")

# Read back in and add PyRuSH
nlp2 = spacy.load("path/to/save")
nlp2.add_pipe("medspacy_pyrush", first=True)
print(nlp2.pipe_names)

I ran this and this worked, and I assume you're not modifying the PyRuSH component so don't necessarily need to serialize it with the rest of the model?

abchapman93 avatar Dec 19 '22 16:12 abchapman93

hi @abchapman93 sir, Thank you so so much . It worked for me . It is indeed a great help . Thank you so so much once again sir . Regards, Hariprakasham

hariprakashamk avatar Dec 19 '22 17:12 hariprakashamk