torchmetrics icon indicating copy to clipboard operation
torchmetrics copied to clipboard

Support sequence tagging evaluation metrics (NLP)

Open pietrolesci opened this issue 2 years ago • 6 comments

🚀 Feature

Support for sequence tagging evaluation metrics à la seqeval. That is, support the evaluation of the performance of chunking tasks such as named-entity recognition, part-of-speech tagging, semantic role labeling and so on.

pietrolesci avatar Jul 22 '22 12:07 pietrolesci

cc @stancld opinion on this?

SkafteNicki avatar Jul 22 '22 13:07 SkafteNicki

I'm not so familiar with this kind of metrics.. How much do these metrics differ from standard classification ones? :] @pietrolesci

stancld avatar Jul 23 '22 15:07 stancld

Hi @stancld,

I think it's not much different. The convenience of having sequence-level metrics already available is that

  • they can be fed sequences directly (without manual iteration)
  • can implement different evaluation "policies": "strict" vs non strict. For example
pred: [A, A, B]
true: [A, B, B]

can be considered partially correct or incorrect. This, of course, has an effect on how results are aggregated. An practical example in the README.md.

  • it can be easier to enforce particular encodings for the NER or POS tags (for example)
  • last but not least, it would be nice to have it in torchmetrics for consistency (i.e., no need to resort to other libraries/frameworks)

pietrolesci avatar Jul 23 '22 15:07 pietrolesci

Hi @pietrolesci, I get the motivation and think this might be a nice contribution to torchmetrics. 👍

As these metrics will be very likely inherited from the classification ones, I'd just wait a bit with this addition for the finalization of the classification refactor currently ongoing #1001 :]

stancld avatar Jul 27 '22 20:07 stancld

Hi @pietrolesci -- I think I should be able to find some time in the near future to have a look at this class of metrics. However, I'm not fully familiar with the current state of tagging metrics. Do you think it will make more sense if our public API will accept something like Sequence[Sequence[str], or it's better to use torch.Tensor here? (I think transformers models tend to output tensors, so it would make sense as well). Also, we can support both options and make sure everything is converted to tensors internally (considering this won't be too much confusing at our public api). What do you think? :] cc: @Borda @SkafteNicki

stancld avatar Oct 05 '22 10:10 stancld

I think it would be good to explore this direction; also we can set a quick call with @pietrolesci to get more context, and maybe he could give us some intro... :rabbit:

Borda avatar Oct 19 '22 12:10 Borda