NeuroNER icon indicating copy to clipboard operation
NeuroNER copied to clipboard

Discontinuous annotations (Brat 1.3)

Open jamesdunham opened this issue 7 years ago • 1 comments

I recently discovered Brat's (v1.3) support for discontinuous annotations. These can be created intentionally by editing an existing annotation and clicking the 'Add Frag.' button. They also seem to be created, at least sometimes, when an annotation is interrupted by a newline. brat_to_conll.py doesn't expect this.

For a .txt file that begins

Lorem ipsum dolor

A discontinuous annotation spanning "Lorem" and "dolor" results in an .ann file formatted

T1	Org 0 5;12 17	Lorem dolor

This .ann file will lead to an error in brat_to_conll.py after the line is .split() and the third element, 5;12, is passed to int() as the annotation's end position.

One way to handle this would be to check lines for more than one start-end position pair, and break apart multiple pairs - moving them to their own lines and duplicating the entity label. This would work for my case. (I'd be happy to submit a PR.) Is it a general-purpose solution?

jamesdunham avatar Jul 16 '17 02:07 jamesdunham

I also had the problem but to lazy to improve it.

Gregory-Howard avatar Jul 24 '17 14:07 Gregory-Howard