medaCy icon indicating copy to clipboard operation
medaCy copied to clipboard

[FEATURE REQUEST] Functionality for analyzing the differences between two Annotation objects.

Open AndriyMulyar opened this issue 5 years ago • 2 comments

What problem does your feature solve? A method to do analysis of annotations (namely for the application of looking at differences between gold and predicted annotations).

Describe the solution you'd like The Annotation class should be given some static methods like Annotation.diff(ann_object_1, ann_object_2) will output the difference between to annotation objects. Maybe some parameter for leniency to deal with fuzzy annotation matching.

Interface sklearn to compute various evaluation metrics between two annotation files (assuming one is gold and one is predicted).

Additional context This would be very useful for result analysis and guiding the building of pipelines.

AndriyMulyar avatar Dec 28 '18 22:12 AndriyMulyar

Currently pull request #68 begins preliminary work mentioned above.

Ideas for further improvements:

  1. Method in Dataset that will allow to compare gold and predicted over a whole corpus by utilizing the diff functionality implemented in #68 .
  2. Give the diff method optional fuzzy parameters that will highlight model predictions that are almost correct (maybe off by a few characters)

AndriyMulyar avatar Jan 01 '19 20:01 AndriyMulyar

I believe the functionality you described is covered by Annotations.compare_by_index(), which has the strict parameter for fuzzy predictions.

swfarnsworth avatar Jan 10 '19 17:01 swfarnsworth