PredPatt icon indicating copy to clipboard operation
PredPatt copied to clipboard

load_conllu throws errors with enhanced dependencies in UD-r2.2

Open venkatasg opened this issue 6 years ago • 3 comments

In UD-r2.2, to accomodate the use of empty nodes for the analysis of ellipsis in enhanced dependencies, the HEAD(gov in code) column is set to _. This throws an error in the load_conllu function, since DepTriple is called with int(gov) as one of the arguments. UD explains these nodes here and here.

Fix is easy enough. One can check the first column for '.', since UD stipulates that empty nodes must have index of the form i.1, where i is the index of referent of ellipsis. If '.' exists, ignore that line. Unless there is some information we can extract from the empty node?

venkatasg avatar Jun 19 '18 22:06 venkatasg

Yes, that's an easy workaround, but it loses empty nodes which are potentially useful for PredPatt. A better solution is to rewrite the way PredPatt deals with the UD index. That requires more efforts.

sheng-z avatar Jun 20 '18 00:06 sheng-z

I’m not sure how useful it will be for PredPatt. Firstly I could find no more than 30 instances of this kind of empty nodes in UD. I’ll look into specific examples but in a sentence like ‘Bill ate cookies, and Tom cake’, the HEAD of Tom is still ate I believe? Or perhaps conjunction plays a role in this.

venkatasg avatar Jun 20 '18 02:06 venkatasg

Does PredPatt use any information from enhanced dependencies at all? Right now I don't think so, and if we want it to in future, that will involve a lot of changes. For now, I think its a good idea to just ignore empty nodes, so that PredPatt doesn't throw an error with UD-r2.2.

venkatasg avatar Jun 20 '18 15:06 venkatasg