spaCy icon indicating copy to clipboard operation
spaCy copied to clipboard

setting an extensions attribute in one span changes it in the other

Open DSLituiev opened this issue 3 years ago • 1 comments
trafficstars

Problem

I am working with a two-level NER taxonomy, where I store the first one in Span.label_ attribute, and the second one in an extension Span._.type. I have annotations from a software that allows for span overlaps, and I am working on a script that reconciles overlapping annotations. After banging my head for a while I realized that spacy behaves quite oddly with extensions. While behavior of the .label_ suggests two spans of the same token range are separate objects, the extensions behave as if it is the same object. I find this quite odd

How to reproduce the behaviour

import spacy
from spacy.tokens import Doc, DocBin, Span
Span.set_extension("type", default=None)

nlp = spacy.load("en_core_web_md")
text = "lives with husband"
doc = nlp(text)


span1 = doc[2:3]
span1.label_ = "social_support"
span1._.type = "has_support"
span1

span2 = doc[2:3]
span2.label_ = "marital_status"
span2._.type = "married"
span2

Now, I would expect these to be two separate span objects of their own with their own labels and extension attributes, but this holds only half way:

print(span1.label_, span1._.type)
# ('social_support', 'married')
#                                 ^^^^^ modifying second span changed the first one!

print(span2.label_, span2._.type)
# ('marital_status', 'married')

Info about spaCy

  • spaCy version: 3.2.0
  • Platform: macOS-10.16-x86_64-i386-64bit
  • Python version: 3.9.7
  • Pipelines: en_core_web_md (3.2.0), en_core_web_sm (3.2.0)

DSLituiev avatar Dec 17 '21 01:12 DSLituiev

Yes, the custom extensions currently only use the span start/end and not any other attributes to distinguish spans. There's a related PR in progress #9708, but some of the serialization details are tricky in terms of backwards compatibility.

adrianeboyd avatar Dec 17 '21 07:12 adrianeboyd

The newer PR didn't get linked to this issue at the time, but we decided to move these changes to v4 and #11429 fixes this bug.

adrianeboyd avatar May 04 '23 07:05 adrianeboyd

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

github-actions[bot] avatar Jun 04 '23 00:06 github-actions[bot]