NeuroNER
NeuroNER copied to clipboard
Discontinuous annotations (Brat 1.3)
I recently discovered Brat's (v1.3) support for discontinuous annotations. These can be created intentionally by editing an existing annotation and clicking the 'Add Frag.' button. They also seem to be created, at least sometimes, when an annotation is interrupted by a newline. brat_to_conll.py
doesn't expect this.
For a .txt file that begins
Lorem ipsum dolor
A discontinuous annotation spanning "Lorem" and "dolor" results in an .ann file formatted
T1 Org 0 5;12 17 Lorem dolor
This .ann file will lead to an error in brat_to_conll.py
after the line is .split()
and the third element, 5;12
, is passed to int()
as the annotation's end position.
One way to handle this would be to check lines for more than one start-end position pair, and break apart multiple pairs - moving them to their own lines and duplicating the entity label. This would work for my case. (I'd be happy to submit a PR.) Is it a general-purpose solution?
I also had the problem but to lazy to improve it.