argilla
argilla copied to clipboard
feat(#1579): validate token classification annotations in client
Closes #1579
Validates the prediction/annotation spans in the client when creating the token classification records. For this, we introduce a new SpanUtils
class that also takes care of transforming the spans into tags.
Comments after the call with @frascuchon :
- from_tags should support BILOU
- add a private attribute to Record classes that hold the SpanUtils instance (this avoids the overhead of computing the char to token mappings every time)
- remove/deprecate token_span and char_id2token_id methods, and private chars2token and tokens2chars attributes
- create a utils module and with it a span_utils.py, move current utils.py to the new utils folder
- [x] support BILOU in
from_tags
- [x] add private
_span_utils
attribute in corresponding token classification record classes - [x] remove/deprecate old helper methods
- [x] create utils module
still missing:
- [x] add/adapt unit tests
Ok, this should be ready to review. @frascuchon I left one inline comment, that I think should be corrected before merging.
Closing.
Merged from #1709