ud-annotatrix icon indicating copy to clipboard operation
ud-annotatrix copied to clipboard

Multiword token range sometimes being saved as HEAD

Open nschneid opened this issue 2 years ago • 12 comments

For words in a multiword token, when I export (download) a .conllu file, sometimes their dependents have the entire MWT in the HEAD column, e.g. 1-2 instead of 1. This breaks the viewer when I reopen the sentence.

nschneid avatar Mar 20 '22 14:03 nschneid

Could you post an example sentence or screenshot that breaks like this?

kmurphy4 avatar Mar 20 '22 14:03 kmurphy4

Now my setup is borked and I can't get the viewer to work for any sentence even with a new upload. :( But here is the sentence that was giving me grief:

1-2	Here's	_	_	_	_	_	_	_	_
1	Here	here	ADV	RB	PronType=Dem	0	root	_	start_char=89|end_char=93
2	's	be	AUX	VBZ	Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin	1	cop	_	start_char=93|end_char=95
3	the	the	DET	DT	Definite=Def|PronType=Art	4	det	_	start_char=96|end_char=99
4	paper	paper	NOUN	NN	Number=Sing	1	nsubj	_	start_char=100|end_char=105
5	that	that	SCONJ	WDT	PronType=Rel	11	mark	_	start_char=106|end_char=110
6	people	people	NOUN	NNS	Number=Plur	11	nsubj	_	start_char=111|end_char=117
7	who	who	PRON	WP	PronType=Rel	8	nsubj	_	start_char=118|end_char=121
8	read	read	VERB	VBP	Mood=Ind|Tense=Pres|VerbForm=Fin	6	acl:relcl	_	start_char=122|end_char=126
9	it	it	PRON	PRP	Case=Acc|Gender=Neut|Number=Sing|Person=3|PronType=Prs	8	obj	_	start_char=127|end_char=129
10	will	will	AUX	MD	VerbForm=Fin	11	aux	_	start_char=130|end_char=134
11	find	find	VERB	VB	VerbForm=Inf	4	acl:relcl	_	start_char=135|end_char=139
12	out	out	ADP	RP	_	11	compound:prt	_	start_char=140|end_char=143
13	about	about	SCONJ	IN	_	17	mark	_	start_char=144|end_char=149
14	how	how	ADV	WRB	PronType=Int	17	advmod	_	start_char=150|end_char=153
15	resumptive	resumptive	ADJ	JJ	Degree=Pos	16	amod	_	start_char=154|end_char=164
16	pronouns	pronoun	NOUN	NNS	Number=Plur	17	nsubj	_	start_char=165|end_char=173
17	help	help	VERB	VBP	Mood=Ind|Tense=Pres|VerbForm=Fin	11	advcl	_	start_char=174|end_char=178
18	islands	island	NOUN	NNS	Number=Plur	17	obj	_	start_char=179|end_char=186
19	go	go	VERB	VB	VerbForm=Inf	17	xcomp	_	start_char=187|end_char=189
20	down	down	ADP	IN	_	19	compound:prt	_	start_char=190|end_char=194
21	a	a	DET	DT	Definite=Ind|PronType=Art	22	det	_	start_char=195|end_char=196
22	little	little	ADJ	JJ	Degree=Pos	23	obl:npmod	_	start_char=197|end_char=203
23	easier	easier	ADJ	JJR	Degree=Cmp	19	advmod	_	start_char=204|end_char=210
24	.	.	PUNCT	.	_	1	punct	_	start_char=210|end_char=211

Somehow token 2 was becoming unattached and token 24 was showing up with 1-2 as its head.

nschneid avatar Mar 20 '22 14:03 nschneid

The viewer is not working anymore in Firefox but I'm able to view and download it in Chrome just fine. And it downloads the correct parse. Do I need to clear my Firefox cache or something?

nschneid avatar Mar 20 '22 14:03 nschneid

Do I need to clear my Firefox cache or something?

And it works in Firefox Private Browsing mode, so something got messed up in my browser session.

nschneid avatar Mar 20 '22 18:03 nschneid

Now my setup is borked and I can't get the viewer to work for any sentence even with a new upload. :( But here is the sentence that was giving me grief:

1-2	Here's	_	_	_	_	_	_	_	_
1	Here	here	ADV	RB	PronType=Dem	0	root	_	start_char=89|end_char=93
2	's	be	AUX	VBZ	Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin	1	cop	_	start_char=93|end_char=95
3	the	the	DET	DT	Definite=Def|PronType=Art	4	det	_	start_char=96|end_char=99
4	paper	paper	NOUN	NN	Number=Sing	1	nsubj	_	start_char=100|end_char=105
5	that	that	SCONJ	WDT	PronType=Rel	11	mark	_	start_char=106|end_char=110
6	people	people	NOUN	NNS	Number=Plur	11	nsubj	_	start_char=111|end_char=117
7	who	who	PRON	WP	PronType=Rel	8	nsubj	_	start_char=118|end_char=121
8	read	read	VERB	VBP	Mood=Ind|Tense=Pres|VerbForm=Fin	6	acl:relcl	_	start_char=122|end_char=126
9	it	it	PRON	PRP	Case=Acc|Gender=Neut|Number=Sing|Person=3|PronType=Prs	8	obj	_	start_char=127|end_char=129
10	will	will	AUX	MD	VerbForm=Fin	11	aux	_	start_char=130|end_char=134
11	find	find	VERB	VB	VerbForm=Inf	4	acl:relcl	_	start_char=135|end_char=139
12	out	out	ADP	RP	_	11	compound:prt	_	start_char=140|end_char=143
13	about	about	SCONJ	IN	_	17	mark	_	start_char=144|end_char=149
14	how	how	ADV	WRB	PronType=Int	17	advmod	_	start_char=150|end_char=153
15	resumptive	resumptive	ADJ	JJ	Degree=Pos	16	amod	_	start_char=154|end_char=164
16	pronouns	pronoun	NOUN	NNS	Number=Plur	17	nsubj	_	start_char=165|end_char=173
17	help	help	VERB	VBP	Mood=Ind|Tense=Pres|VerbForm=Fin	11	advcl	_	start_char=174|end_char=178
18	islands	island	NOUN	NNS	Number=Plur	17	obj	_	start_char=179|end_char=186
19	go	go	VERB	VB	VerbForm=Inf	17	xcomp	_	start_char=187|end_char=189
20	down	down	ADP	IN	_	19	compound:prt	_	start_char=190|end_char=194
21	a	a	DET	DT	Definite=Ind|PronType=Art	22	det	_	start_char=195|end_char=196
22	little	little	ADJ	JJ	Degree=Pos	23	obl:npmod	_	start_char=197|end_char=203
23	easier	easier	ADJ	JJR	Degree=Cmp	19	advmod	_	start_char=204|end_char=210
24	.	.	PUNCT	.	_	1	punct	_	start_char=210|end_char=211

Somehow token 2 was becoming unattached and token 24 was showing up with 1-2 as its head.

Hm, if I copy-paste that sentence into the textbox, it seems to work? image

What else do I need to do to repro your issue?

kmurphy4 avatar Mar 20 '22 22:03 kmurphy4

Not sure. In a new browser session I can't reproduce. Must have something to do with corrupted local storage or whatever in my original session.

nschneid avatar Mar 20 '22 22:03 nschneid

Oh but this is interesting: if I click the third word and enter "c" to merge left, it deletes the first word. So the merging functionality must be buggy.

nschneid avatar Mar 20 '22 22:03 nschneid

Another bug: if you initialize the tree without the multiword token, click the 2nd word and merge left, it concatenates the words in the wrong order:

1-2	'sHere	_	_	_	_	_	_	_	_

nschneid avatar Mar 20 '22 22:03 nschneid

This should check the indices to see which token is first (combining two tokens into a multiword token/supertoken):

https://github.com/jonorthwash/ud-annotatrix/blob/e951f7255e5b02b8c545af13a4a4e1e53f67054f/notatrix/src/nx/sentence.js#L381

also here (merging two tokens into one regular token):

https://github.com/jonorthwash/ud-annotatrix/blob/e951f7255e5b02b8c545af13a4a4e1e53f67054f/notatrix/src/nx/sentence.js#L324

nschneid avatar Mar 20 '22 23:03 nschneid

Oops, I didn't mean to close the whole issue ... but https://github.com/jonorthwash/ud-annotatrix/commit/a3828a4796f3f67e0517e5b8ccdc41b0b901832c should fix this part:

Another bug: if you initialize the tree without the multiword token, click the 2nd word and merge left, it concatenates the words in the wrong order:

1-2	'sHere	_	_	_	_	_	_	_	_

Thanks for the hint :grin:

keggsmurph21 avatar Mar 21 '22 01:03 keggsmurph21

Oh but this is interesting: if I click the third word and enter "c" to merge left, it deletes the first word. So the merging functionality must be buggy.

f6865f1

keggsmurph21 avatar Mar 21 '22 01:03 keggsmurph21

Thanks, pulled the update. Now I find that if I create several multiword tokens and then select one of them to split ("s"), it may split the wrong one.

nschneid avatar Mar 21 '22 02:03 nschneid