argilla icon indicating copy to clipboard operation
argilla copied to clipboard

feat(#1579): validate token classification annotations in client

Open dcfidalgo opened this issue 2 years ago • 1 comments

Closes #1579

Validates the prediction/annotation spans in the client when creating the token classification records. For this, we introduce a new SpanUtils class that also takes care of transforming the spans into tags.

dcfidalgo avatar Aug 09 '22 16:08 dcfidalgo

Comments after the call with @frascuchon :

  • from_tags should support BILOU
  • add a private attribute to Record classes that hold the SpanUtils instance (this avoids the overhead of computing the char to token mappings every time)
  • remove/deprecate token_span and char_id2token_id methods, and private chars2token and tokens2chars attributes
  • create a utils module and with it a span_utils.py, move current utils.py to the new utils folder
  • [x] support BILOU in from_tags
  • [x] add private _span_utils attribute in corresponding token classification record classes
  • [x] remove/deprecate old helper methods
  • [x] create utils module

still missing:

  • [x] add/adapt unit tests

dcfidalgo avatar Aug 09 '22 16:08 dcfidalgo

Ok, this should be ready to review. @frascuchon I left one inline comment, that I think should be corrected before merging.

dcfidalgo avatar Aug 21 '22 14:08 dcfidalgo

Closing.

Merged from #1709

frascuchon avatar Sep 09 '22 13:09 frascuchon