spaCy icon indicating copy to clipboard operation
spaCy copied to clipboard

Thoughts on Underscore class for Spans with the same boundaries?

Open nrodnova opened this issue 2 years ago • 3 comments

How to reproduce the behaviour

The uniqueness of Span is determined based on (start, end, label_, kb_id_). Underscore class stores extensions only based on (start, end). Any thoughts on that?

span_1 = Span(doc, 0, 3, label = 'LABEL_1', kb_id = 'KB_ID_1')
span_2 = Span(doc, 0, 3, label = 'LABEL_2', kb_id = 'KB_ID_2')
assert span_1 != span_2
span_1._.test = 'span_1._.text'
print(span_2._.test)

The output, as expected, is:

 'span_1._.text'

Which poses a problem now, when we have SpanGroups and overlapping Spans. It can be fixed pretty easily, I think. I can do it, but I want to hear the thoughts of the maintainers first.

nrodnova avatar Nov 19 '21 14:11 nrodnova

I suspect that we're not going to want to make the Span extensions more complicated. It's useful that Doc, Span, and Token extensions are all keyed on the same index values and all have the same internal shape.

adrianeboyd avatar Nov 19 '21 14:11 adrianeboyd

Do you have any suggestions/thoughts on how to deal with the situation then? :) Meaning, avoiding overwriting Span's extension values for unequal Spans with the same boundaries?

nrodnova avatar Nov 19 '21 14:11 nrodnova

Hmm, the only workarounds I can think of off the top of my head involve keying the internal extension values based on the labels and kb_ids somehow, but I'm not sure how well this would work in practice.

If you've already implemented it, you can open a PR and we can have a look? But I think it's going to be hard to do without affecting the Doc and Token extensions?

adrianeboyd avatar Nov 19 '21 14:11 adrianeboyd

I think this is resolved by #11429.

adrianeboyd avatar May 05 '23 06:05 adrianeboyd

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

github-actions[bot] avatar Jun 05 '23 00:06 github-actions[bot]