NeuroNER icon indicating copy to clipboard operation
NeuroNER copied to clipboard

Add handling of discontinuous annotations (brat >= 1.3).

Open jamesdunham opened this issue 7 years ago • 10 comments

This PR addresses the issue described in #36 when brat_to_conll.py encounters discontinuous annotations created by brat >= 1.3. These can be created unintentionally by including a newline in the span of an annotation, or manually ("Add Frag").

I implemented two possible behaviors. A discontinuous annotation can either be split into multiple annotations (one for each fragment) or joined into an expanded annotation that starts with the first fragment and ends with the last. For examples see test_brat_to_conll.py.

The choice is controlled by a new parameter split_discontinuous. Its default is False, i.e., joining, because of the case where discontinuous annotations are unintentional.

jamesdunham avatar Sep 03 '17 20:09 jamesdunham

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Sep 03 '17 20:09 CLAassistant

Is this change works also for convert from conll to brat or it isn't neccesary to change conll_to_brat file?

rriveraz avatar May 03 '18 12:05 rriveraz

It isn't necessary. The issue is only with brat to conll.

jamesdunham avatar May 03 '18 13:05 jamesdunham

Ok thank you for help. Do you have an example of the output?. Also do you have the code with the changes?. If so i really appreciate if you can share it with me. Thank you.

rriveraz avatar May 03 '18 15:05 rriveraz

Sure, the new tests demonstrate the changes. If you're running into problems with discontinuous annotations and need this fix now, you could clone my fork. It's up to date at the moment.

jamesdunham avatar May 03 '18 17:05 jamesdunham

Hi James. I wonder if you can help with this problem. My problem is that i have annotations between or inner other annotations for example:

T2 SCOPE 53 69 with no dementia T3 NEGATION 58 60 no T4 DISABILITY 61 69 dementia

or

T3 SCOPE 1420 1455 not dependent on others for walking T4 NEGATION 1420 1423 not T5 DISABILITY 1424 1455 dependent on others for walking

I think i could manage like disconitunous annotations but i don't know if this is the best option. When i use the original brat_to_conll file it always kept with the first annotation in this case with the scope annotation. Do you know how manage this kind of inner annotations?. Really appreciate your help. Thank you.

rriveraz avatar May 04 '18 08:05 rriveraz

Sorry, I haven't looked into options for handling overlapping annotations.

jamesdunham avatar May 04 '18 10:05 jamesdunham

Thanks for your help James. By the way is it possible to identify interaction between entities with neuroner given a brat annotation? like:

T39 disease 72 82 carcinomas T56 body-part 61 71 colorectal R1 relatedTo Arg1:T39 Arg2:T56

Thank you

rriveraz avatar May 04 '18 12:05 rriveraz

Hi James.

Just a quick question, do you know if neuroner use some kind of padding for character embedding?

Hope you can help me with this.

rriveraz avatar Aug 21 '18 21:08 rriveraz

Whatever happened to this PR? I'm trying to load a bunch of brat annotation files with discontinuous annotations. Is @jamesdunham 's fork still the only option and is the master branch ahead of it in other ways?

Jongmassey avatar May 20 '19 11:05 Jongmassey