Search icon indicating copy to clipboard operation
Search copied to clipboard

Alternative solutions for NER with N>>1 entity types

Open FrancescoCasalegno opened this issue 4 years ago • 1 comments

Scope

If we need to implement a NER system supporting mining of N different entity types, we can do so in different way. In particular, if N>>1 different strategies may have different drawbacks.

  1. Train N separate models, each of them capable of extracting only 1 entity type.
    • We may need to run up to N models on the same text, at least once—then one can still cache the results.
    • There is a significant potential of several models extracting overlapping entities—how do we handle this?
  2. Train 1 huge model, capable of extracting N entity types.
    • If the choice of the pre-trained base model turns out to have a big impact on the final performance (see #294) then this strategy may produce models with sub-optimal accuracy since we have to choose a single base model for all entity types.
    • What if some entities are actually overlapping in the ground truth? E.g. we may have nested entities. The this model is not able to address this kind of problems.
  3. Train K models, where the k-th model is capable of extracting N_k entity types and N_1 + ... + N_K = N.
    • This is the strategy we have been using so far.
    • Drawbacks of Approach 1 also apply here.

We should do some research on how others are handling these problems, and also do our tests and evaluate the results to adopt the best strategy. Also, when choosing our strategy we should always keep in mind that our solution must be flexible and easy to extend and fine-tune (see #293).

FrancescoCasalegno avatar Mar 15 '21 14:03 FrancescoCasalegno

Regarding 1): Even though this setup might seem extreme, one possibility is to have a shared backbone BERT encoder (we can take BioBERT or even the model we use for sentence embeddings) and then just have a per entity type token classifier predicting O, B or I. We never touch the weights of the backbone and only train the token classifiers.

jankrepl avatar Mar 18 '21 16:03 jankrepl