folia icon indicating copy to clipboard operation
folia copied to clipboard

each AnnotationLayer should have children in one set only

Open kosloot opened this issue 8 years ago • 6 comments

At the moment it is possible to add children from different sets to an annotation-layer. This is undesirable. A layer has 'per definition' the set of its children, and multiple sets would violate this. Simplest solution is to let the append method for layers check every child on its set. The fist child would determine the set of the layer. (if not set on creation)

kosloot avatar Oct 26 '17 09:10 kosloot

I'm not entirely sure this is the issue I've run into, but it may just be. As it happens, Frog can produce FoLiA output that has two sets for entities: one for NER and one for MWU. I found out about this whole issue because I was trying to add entities from gold annotations that Frog missed. It looks like you either use the Entity class in the add method, in which case exactly one set must be available and that will be used as the default set, or you add an Entity you create yourself, but then it cannot find a suitable common ancestor.

A workaround in this case would be to use --skip=m on Frog so it does nothing with MWU and adds only one set for entity annotations to its output.

asharkinasuit avatar Mar 29 '18 07:03 asharkinasuit

You're saying it is not finding a common ancestor if you call add() with an explicit set= keyword argument? If there are multiple sets then it should raise an exception indicating that a set argument is required (if there is only one then it is the default indeed). So this may be a bug in pynlpl, the workaround should not be necessary.

proycon avatar Mar 29 '18 11:03 proycon

I believe I neglected the set keyword. I see if that's present, there shouldn't be a problem.

asharkinasuit avatar Mar 29 '18 11:03 asharkinasuit

Well, if you the get the no common ancestor error if you omit set, then I still count that as a bug as the error should be clearer.

proycon avatar Mar 29 '18 12:03 proycon

I think that error was probably part of using things in ways they weren't intended to be, but I've been hacking at this for several days and it's a bit fuzzy. It might have been when I created an Entity myself and it couldn't find any ancestors. I remember I tried to trace the ancestor thing in the debugger and it looked like it was something you just weren't supposed to do, not like a missing or unclear error message.

asharkinasuit avatar Mar 29 '18 12:03 asharkinasuit

As a sidenote: The C++ version of libfolia is designed to check and reject this. But I am not sure if the Python library already does so. Frog generates MWU and NER entity layers in different sets, so there should not be a problem. But when manually editing the Frog output, you should be aware of the different set names, and use the appropriate one. Or create a new set name, and add its declaration in the metadata annotations. Also entities should always reside in a layer.

kosloot avatar Mar 29 '18 21:03 kosloot