haystack Extend support for token classification

Is your feature request related to a problem? Please describe. It would be great if the support for token classification could be extended beyond what the Extractor currently offers. Specifically, we'd also need training and evaluation for token classification models. The node should also be able to support splitting and aggregation of longer texts to work around the 512 token limit present in most language models.

Describe the solution you'd like Extension / re-implementation of the Extractor node to support the additional features.

Aug 04 '22 15:08 mathislucka

Additionally, we want to consider different postprocessing strategies when combining the predicted labels together. For example the prediction ["B-DEFENDER", "I-DEFENDER"] will be combined into one entity, but what should be done with a prediction like ["O", "I-DEFENDER", "O"]?

Aug 05 '22 13:08 sjrl

@sjrl was this resolved by #3154 ?

Nov 02 '22 07:11 masci

Hi @masci, PR #3154 partially resolves this issue. The PR did not add the training and evaluation of token classification models. I can edit the text of the main issue to better reflect the remaining tasks.

Nov 14 '22 15:11 sjrl