SkillNER icon indicating copy to clipboard operation
SkillNER copied to clipboard

Make text cleaning optional.

Open ruben-dedoncker opened this issue 2 years ago • 1 comments

Is your feature request related to a problem? Please describe. The cleaning of the text makes it impossible to link annotated spans to the character indices of the original text. This in turn makes it impossible to compare the performance of this model to other ner models.

Describe the solution you'd like Make the text cleaning step optional. When the cleaning step is omitted, then abv_text == immutable_text.

Describe alternatives you've considered Provide additional metadata containing the start and end character indices of each annotated span linked to the original text rather in addition to the boundaries linked to the cleaned text

ruben-dedoncker avatar Dec 26 '22 11:12 ruben-dedoncker

You could instantiate your own empty skillNer.cleaner.Cleaner to bypass text cleaning. However you also want to protect abv_text from later processing, which would require some changes in the code.

grafik

AnAnalogGuy avatar Mar 30 '23 19:03 AnAnalogGuy