nlp4j-old
nlp4j-old copied to clipboard
Two root nodes from one sentence to the DP, any advice?
We trained a UD model with the UD treebank plus the WSJ converted to UD with the Stanford converter. Every so often, a sentence we run comes out with a seemingly impossible structure with an 'extra' root node. The cases we've seen have always involved the 'conj' label.
Does this suggest anything to you? I could share the data and/or the model file if you are interested.
In Anglo-American common law courts, appellate review of lower court decisions may also be obtained by filing a petition for review by prerogative writ in certain cases.
case(courts-5, In-1) amod(courts-5, Anglo-American-2) amod(courts-5, common-3) compound(courts-5, law-4) conj(ROOT-0, courts-5) punct(courts-5, ,-6) amod(review-8, appellate-7) conj(courts-5, review-8) case(decisions-12, of-9) amod(decisions-12, lower-10) compound(decisions-12, court-11) nmod(review-8, decisions-12) aux(obtained-16, may-13) advmod(obtained-16, also-14) auxpass(obtained-16, be-15) root(ROOT-0, obtained-16) mark(filing-18, by-17) advcl(obtained-16, filing-18) det(petition-20, a-19) dobj(filing-18, petition-20) case(review-22, for-21) nmod(filing-18, review-22) case(writ-25, by-23) compound(writ-25, prerogative-24) nmod(filing-18, writ-25) case(cases-28, in-26) amod(cases-28, certain-27) nmod(filing-18, cases-28) punct(obtained-16, .-29)
I've been able to demonstrate this using your stock English model.
As well as for complex voice emotional recognition for emotions not included in Mind Reading .
1 As as RB _ 3 advmod _ @#r$%
2 well well RB _ 3 advmod _ @#r$%
3 as as IN _ 5 advmod _ @#r$%
4 for for IN _ 0 conj _ @#r$%
5 complex complex JJ _ 8 nmod _ @#r$%
6 voice voice NN _ 8 nmod _ @#r$%
7 emotional emotional JJ _ 8 nmod _ @#r$%
8 recognition recognition NN _ 4 pobj _ @#r$%
9 for for IN _ 8 prep _ @#r$%
10 emotions emotion NNS _ 9 pobj _ @#r$%
11 not not RB _ 12 neg _ @#r$%
12 included include VBN pos2=VBD 0 root _ @#r$%
13 in in IN _ 12 prep _ @#r$%
14 Mind mind NN pos2=NNP 15 compound _ @#r$%
15 Reading reading NN pos2=VBG 13 pobj _ @#r$%
16 . . . _ 12 punct _ @#r$%
I tried adding a feature to the model.
Please don't laugh if my attempt to figure out the feature templates was unsuccessful.
My idea was to discourage things like a conj deprel with the root. Of course, this sort of thing can't globally ding sentences for having more than one root. If there's a way to introduce that idea into the feature set I haven't understood it yet.
<feature f0="i:dependency_label" f1="i_h:part_of_speech_tag"/>
With my universal training (UD treebank + converted PTB) and the UD dev set the accuracy was the same,
UAS 0.88 LAS 0.85 total tokens 25148
but the number of two-headed outputs from the UD dev set dropped from 4% to 3%.
We somewhat belatedly read the papers and understand that this is expected, so we thinking about how to cope.
Sorry for the late reply; I was meeting a grant proposal deadline. Multiple roots may be caused because by headless nodes; when the parser doesn't find any head, then it by default connects it to the root to keep the entire tree connected, but this is something I should retouch. I'm planning to adapt our structure to UD more now, so I can experiment this more myself as well.
OK, thanks.
On Thu, Jul 21, 2016 at 2:33 PM, Jinho D. Choi [email protected] wrote:
Sorry for the late reply; I was meeting a grant proposal deadline. Multiple roots may be caused because by headless nodes; when the parser doesn't find any head, then it by default connects it to the root to keep the entire tree connected, but this is something I should retouch. I'm planning to adapt our structure to UD more now, so I can experiment this more myself as well.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emorynlp/nlp4j/issues/32#issuecomment-234342973, or mute the thread https://github.com/notifications/unsubscribe-auth/ADM9zzDxNb6uFiviLtmlPbHBJW33kpptks5qX7tigaJpZM4JLr6C .