spaCy Add spancat_exclusive pipeline for non-overlapping span labelling tasks

Context

The current spancat implementation always treats the span labelling task as a multilabel problem. It uses the Logistic layer to output class probabilities independently for each class. However, when presented with a muticlass problem (exclusive class) it might be a disadvantage to not use the correct modeling assumptions. The spancat-exclusive component uses the Softmax layer instead.

Description

This PR adds another pipeline, spancat_exclusive, to account for exclusive classes in span categorization tasks. It does this by introducing the concept of a "negative label" or "no label." In spancat, the number of span labels is exactly the same as what's found in a dataset's annotation. Here, we add another column to account for the negative label.

We didn't touch the add_label implementation. It is the same as spancat's. Instead, we implemented two additional properties: _negative_label (returns the index of the negative label) and _n_labels returns the length of label_data + 1 and changed initialize to create a Softmax layer with the extra negative label.
This in turn affects how the annotations are created during inference (cf. set_annotations). Again, we modified _make_span_group to accommodate this change.

Technical Explanation of Changes

⏯️ Training: how is the loss computed this time? (also a note about the `negative_weight` param)

During initialization, we pass the number of labels (n_labels + 1) so that the score matrix has a shape (n_samples, n_labels + 1), where +1 accounts for the negative example. At train time, the score matrix should already be accounting for the negative example. In this implementation, the negative example is always at the last column.

Figure: Simple example using ConLL labels (ORG, MISC, PER, LOC)

In the get_loss() function, we then assign the value 1.0 to the last column whenever a particular span is a negative example.

# spancat_exclusive.py::SpanCategorizerExclusive.get_loss()
target = self.model.ops.asarray(target, dtype="f")  # type: ignore
negative_samples = numpy.nonzero(negative_spans)[0]
target[negative_samples, self._negative_label] = 1.0

We then compute the scores and loss for backprop as usual (i.e., d_scores = scores - target). We also added an option of specifying a negative_weight to "control" the effect of the negative class (a form of class weighing). Higher values (>1) increases the effect of the negative class, while lower values minimizes it (<1).

⏯️ Inference: how are the annotations predicted? (also a note about the `allow_overlap` param)

During inference, we remove the samples where the negative label is the prediction. In addition, if the allow_overlap parameter is set to False, then overlapping spans are not stored (only the span with the highest predict probability). This is then tracked by the Ranges data structure.

⏯️ Testing on other datasets [WIP]

TODO - Compare spancat and spancat_exclusive on some datasets

Types of change

Feature implementation
New pipeline for exclusive spancat
Tests and documentation

Checklist

[ ] I confirm that I have the right to submit this contribution under the project's MIT license.
[ ] I ran the tests, and all new and existing tests passed.
[ ] My changes don't require a change to the documentation, or if they do, I've added all required information.

Aug 24 '22 03:08 ljvmiranda921

I think implementation-wise this PR can be reviewed. We'd definitely still want to run a few experiments comparing exclusive_spancat and spancat on a number of datasets. I'm not sure if I should do that one first, or perhaps you want to take a look on the current code before we run experiments. cc: @kadarakos @adrianeboyd

Sep 08 '22 06:09 ljvmiranda921

Hi! I've added a benchmark that tests spancat_exclusive and spancat on a number of NER datasets. Overall, the former seems to work well (across three trials, reporting their avg. and stdev). I'd like another round of review again to check in with the implementation. 🙇

You can find the benchmarking project here. It's my fork of explosion/projects. Once exclusive_spancat's been merged, I can also make another PR to explosion/projects to include this benchmark.

Nov 14 '22 00:11 ljvmiranda921

Can you add tests for this similar to the existing spancat tests?

Nov 17 '22 07:11 adrianeboyd

Hi! I extended the tests from spancat to also include spancat_exclusive. I also updated the website documentation for spancat to mention additional parameters from spancat_exclusive.

Nov 24 '22 03:11 ljvmiranda921

@explosion-bot please test_gpu

Nov 28 '22 08:11 adrianeboyd

🪁 Successfully triggered build on Buildkite

URL: https://buildkite.com/explosion-ai/spacy-gpu-test-suite/builds/117

Nov 28 '22 08:11 explosion-bot

@ljvmiranda921 : could you have a look at the conflicts? 🙏

Dec 12 '22 08:12 svlandeg

It looks like it needs to be updated to handle empty docs.

Dec 20 '22 14:12 adrianeboyd

Ok will do that!

Dec 21 '22 01:12 ljvmiranda921

Saw the updates and the merge conflict. I'll solve it first thing tomorrow!

Jan 12 '23 13:01 ljvmiranda921

Done! I saw the comments regarding the docstrings for spancat.py. I can include the changes here unless we want it in a different PR!

Jan 17 '23 04:01 ljvmiranda921

I saw the comments regarding the docstrings for spancat.py. I can include the changes here unless we want it in a different PR!

These are pretty minimal fixes, I don't think you need to open a new PR for them, it's basically making sure things are consistent across the two components.

Jan 17 '23 08:01 svlandeg

A few things related to the overall config/design vs. spancat:

this feels like it should be what happens for max_positive = 1 in spancat
wouldn't it make sense for allow_overlap = false to also be an option for spancat?

In general it feels like this functionality should be possible with options for one SpanCategorizer class and the only difference for the spancat_exclusive factory (or whatever it's called) is that there is a different model in the default config. (I'm not entirely sure, but I do think it's going to be tricky to do all this with one spancat config because you can't parameterize the default model.)

Jan 18 '23 08:01 adrianeboyd

I agree that max_positive = 1 implies spancat_exclusive basically. Also agree that allow_overlap is not specific to spancat_exclusive.

But the difference is not only the Logistic vs. Softmax in the output layer, but the get_loss is also different due to the negative class. Also, spancat_exclusive has the negative_weight parameter, whereas spancat has the threshold. So if the SpanCategorizer was implementing both it would have threshold which would be out of use for max_positive = 1 and negative_weights should only be used for max_positive = 1.

Should we merge the two implementations together into a single SpanCategorizer first and see how that looks like?

Jan 30 '23 15:01 kadarakos

The spancat_exclusive.py will be deleted if we agree on the design. I quite like it that there is one SpanCategorizer and two factories.

Feb 01 '23 09:02 kadarakos

Just a couple small notes, not a thorough review yet

I still need to add a make_span_group_singlelable and make_span_group_multilabel test to see if they work in the add_negative_label case.

Feb 09 '23 08:02 kadarakos

Encountered a bit of problem. When I have the add_negative_label = True I need to access the self._negative_label_index. But _make_span_group takes labels: List[str] and so I don't know which one is supposed to be the negative label. In general, not sure why the method ended up like this originally like why allow the _make_span_group take any arbitrary list as labels that actually doesn't have to match the scores and indices coming from the model?

I can probably work around it maybe somehow, but I feel like what is already there is kinda clunky to begin with. I don't think we should use this arbitrary List[str] and it would be better to use the component's own self.labels or self.label_map? But maybe I'm missing some use case where this flexibility is nice? But if so, why not have make_span_group in utils that takes all the different kinds of arguments, but make sure that the SpanCategorizer calls it with something meaningful? But then this would be breaking :( .

Feb 09 '23 11:02 kadarakos

I think it makes sense to refactor _make_span_group to remove the labels argument.

Feb 10 '23 06:02 adrianeboyd

I think it makes sense to refactor _make_span_group to remove the labels argument.

Okay that's nice! I'm getting somewhere with it, but in the meantime I also realized that I do not necessarily understand the reason for having the max_positive argument actually in the multilabel case either. We already have the threshold. So why would you say this: "keep only the top-3 labels that pass the threshold"? Why not all labels that pass the threshold?

This came up, because in the add_negative_label case if the negative_label is in the top-3 then the current code returns only the top-2. So I would then have to look for more options that pass the threshold. Just made me think about why not just all of it? What's the use case?

Feb 10 '23 12:02 kadarakos

I think the last thing for the docs is to figure out how to mark all the new things clearly as new.

Mar 06 '23 07:03 adrianeboyd

I think the last thing for the docs is to figure out how to mark all the new things clearly as new.

How do we do that usually? We put this indication like from which version a feature is available right?

Mar 06 '23 15:03 kadarakos

Yes, there would be new tags from 3.5.1. What I'm not sure is exactly how best to mark spancat_singlelabel on the combined API docs page.

Mar 06 '23 15:03 adrianeboyd

I added the <Tag variant="new">3.5.1</Tag> in in some places where I've felt like it communicates it more or less unambiguously what is new, but I wasn't exactly sure.

Mar 07 '23 09:03 kadarakos

The markers should go next to the setting name in the settings column rather than in the descriptions.

Mar 07 '23 12:03 adrianeboyd

Could you also revert all the formatting-only changes to the other .mdx files?

Mar 07 '23 12:03 adrianeboyd

I think this is good to go!

Mar 09 '23 09:03 adrianeboyd

spaCy spaCy copied to clipboard

Add spancat_exclusive pipeline for non-overlapping span labelling tasks

Context

Description

Technical Explanation of Changes

⏯️ Training: how is the loss computed this time? (also a note about the negative_weight param)

⏯️ Inference: how are the annotations predicted? (also a note about the allow_overlap param)

⏯️ Testing on other datasets [WIP]

Types of change

Checklist

spaCy
spaCy copied to clipboard

⏯️ Training: how is the loss computed this time? (also a note about the `negative_weight` param)

⏯️ Inference: how are the annotations predicted? (also a note about the `allow_overlap` param)