ud-annotatrix Multiword token range sometimes being saved as HEAD

For words in a multiword token, when I export (download) a .conllu file, sometimes their dependents have the entire MWT in the HEAD column, e.g. 1-2 instead of 1. This breaks the viewer when I reopen the sentence.

Mar 20 '22 14:03 nschneid

Could you post an example sentence or screenshot that breaks like this?

Mar 20 '22 14:03 kmurphy4

Now my setup is borked and I can't get the viewer to work for any sentence even with a new upload. :( But here is the sentence that was giving me grief:

1-2	Here's	_	_	_	_	_	_	_	_
1	Here	here	ADV	RB	PronType=Dem	0	root	_	start_char=89|end_char=93
2	's	be	AUX	VBZ	Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin	1	cop	_	start_char=93|end_char=95
3	the	the	DET	DT	Definite=Def|PronType=Art	4	det	_	start_char=96|end_char=99
4	paper	paper	NOUN	NN	Number=Sing	1	nsubj	_	start_char=100|end_char=105
5	that	that	SCONJ	WDT	PronType=Rel	11	mark	_	start_char=106|end_char=110
6	people	people	NOUN	NNS	Number=Plur	11	nsubj	_	start_char=111|end_char=117
7	who	who	PRON	WP	PronType=Rel	8	nsubj	_	start_char=118|end_char=121
8	read	read	VERB	VBP	Mood=Ind|Tense=Pres|VerbForm=Fin	6	acl:relcl	_	start_char=122|end_char=126
9	it	it	PRON	PRP	Case=Acc|Gender=Neut|Number=Sing|Person=3|PronType=Prs	8	obj	_	start_char=127|end_char=129
10	will	will	AUX	MD	VerbForm=Fin	11	aux	_	start_char=130|end_char=134
11	find	find	VERB	VB	VerbForm=Inf	4	acl:relcl	_	start_char=135|end_char=139
12	out	out	ADP	RP	_	11	compound:prt	_	start_char=140|end_char=143
13	about	about	SCONJ	IN	_	17	mark	_	start_char=144|end_char=149
14	how	how	ADV	WRB	PronType=Int	17	advmod	_	start_char=150|end_char=153
15	resumptive	resumptive	ADJ	JJ	Degree=Pos	16	amod	_	start_char=154|end_char=164
16	pronouns	pronoun	NOUN	NNS	Number=Plur	17	nsubj	_	start_char=165|end_char=173
17	help	help	VERB	VBP	Mood=Ind|Tense=Pres|VerbForm=Fin	11	advcl	_	start_char=174|end_char=178
18	islands	island	NOUN	NNS	Number=Plur	17	obj	_	start_char=179|end_char=186
19	go	go	VERB	VB	VerbForm=Inf	17	xcomp	_	start_char=187|end_char=189
20	down	down	ADP	IN	_	19	compound:prt	_	start_char=190|end_char=194
21	a	a	DET	DT	Definite=Ind|PronType=Art	22	det	_	start_char=195|end_char=196
22	little	little	ADJ	JJ	Degree=Pos	23	obl:npmod	_	start_char=197|end_char=203
23	easier	easier	ADJ	JJR	Degree=Cmp	19	advmod	_	start_char=204|end_char=210
24	.	.	PUNCT	.	_	1	punct	_	start_char=210|end_char=211

Somehow token 2 was becoming unattached and token 24 was showing up with 1-2 as its head.

Mar 20 '22 14:03 nschneid

The viewer is not working anymore in Firefox but I'm able to view and download it in Chrome just fine. And it downloads the correct parse. Do I need to clear my Firefox cache or something?

Mar 20 '22 14:03 nschneid

Do I need to clear my Firefox cache or something?

And it works in Firefox Private Browsing mode, so something got messed up in my browser session.

Mar 20 '22 18:03 nschneid

Now my setup is borked and I can't get the viewer to work for any sentence even with a new upload. :( But here is the sentence that was giving me grief:

1-2	Here's	_	_	_	_	_	_	_	_
1	Here	here	ADV	RB	PronType=Dem	0	root	_	start_char=89|end_char=93
2	's	be	AUX	VBZ	Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin	1	cop	_	start_char=93|end_char=95
3	the	the	DET	DT	Definite=Def|PronType=Art	4	det	_	start_char=96|end_char=99
4	paper	paper	NOUN	NN	Number=Sing	1	nsubj	_	start_char=100|end_char=105
5	that	that	SCONJ	WDT	PronType=Rel	11	mark	_	start_char=106|end_char=110
6	people	people	NOUN	NNS	Number=Plur	11	nsubj	_	start_char=111|end_char=117
7	who	who	PRON	WP	PronType=Rel	8	nsubj	_	start_char=118|end_char=121
8	read	read	VERB	VBP	Mood=Ind|Tense=Pres|VerbForm=Fin	6	acl:relcl	_	start_char=122|end_char=126
9	it	it	PRON	PRP	Case=Acc|Gender=Neut|Number=Sing|Person=3|PronType=Prs	8	obj	_	start_char=127|end_char=129
10	will	will	AUX	MD	VerbForm=Fin	11	aux	_	start_char=130|end_char=134
11	find	find	VERB	VB	VerbForm=Inf	4	acl:relcl	_	start_char=135|end_char=139
12	out	out	ADP	RP	_	11	compound:prt	_	start_char=140|end_char=143
13	about	about	SCONJ	IN	_	17	mark	_	start_char=144|end_char=149
14	how	how	ADV	WRB	PronType=Int	17	advmod	_	start_char=150|end_char=153
15	resumptive	resumptive	ADJ	JJ	Degree=Pos	16	amod	_	start_char=154|end_char=164
16	pronouns	pronoun	NOUN	NNS	Number=Plur	17	nsubj	_	start_char=165|end_char=173
17	help	help	VERB	VBP	Mood=Ind|Tense=Pres|VerbForm=Fin	11	advcl	_	start_char=174|end_char=178
18	islands	island	NOUN	NNS	Number=Plur	17	obj	_	start_char=179|end_char=186
19	go	go	VERB	VB	VerbForm=Inf	17	xcomp	_	start_char=187|end_char=189
20	down	down	ADP	IN	_	19	compound:prt	_	start_char=190|end_char=194
21	a	a	DET	DT	Definite=Ind|PronType=Art	22	det	_	start_char=195|end_char=196
22	little	little	ADJ	JJ	Degree=Pos	23	obl:npmod	_	start_char=197|end_char=203
23	easier	easier	ADJ	JJR	Degree=Cmp	19	advmod	_	start_char=204|end_char=210
24	.	.	PUNCT	.	_	1	punct	_	start_char=210|end_char=211

Somehow token 2 was becoming unattached and token 24 was showing up with 1-2 as its head.

Hm, if I copy-paste that sentence into the textbox, it seems to work?

What else do I need to do to repro your issue?

Mar 20 '22 22:03 kmurphy4

Not sure. In a new browser session I can't reproduce. Must have something to do with corrupted local storage or whatever in my original session.

Mar 20 '22 22:03 nschneid

Oh but this is interesting: if I click the third word and enter "c" to merge left, it deletes the first word. So the merging functionality must be buggy.

Mar 20 '22 22:03 nschneid

Another bug: if you initialize the tree without the multiword token, click the 2nd word and merge left, it concatenates the words in the wrong order:

1-2	'sHere	_	_	_	_	_	_	_	_

Mar 20 '22 22:03 nschneid

This should check the indices to see which token is first (combining two tokens into a multiword token/supertoken):

https://github.com/jonorthwash/ud-annotatrix/blob/e951f7255e5b02b8c545af13a4a4e1e53f67054f/notatrix/src/nx/sentence.js#L381

also here (merging two tokens into one regular token):

https://github.com/jonorthwash/ud-annotatrix/blob/e951f7255e5b02b8c545af13a4a4e1e53f67054f/notatrix/src/nx/sentence.js#L324

Mar 20 '22 23:03 nschneid

Oops, I didn't mean to close the whole issue ... but https://github.com/jonorthwash/ud-annotatrix/commit/a3828a4796f3f67e0517e5b8ccdc41b0b901832c should fix this part:

Another bug: if you initialize the tree without the multiword token, click the 2nd word and merge left, it concatenates the words in the wrong order:
1-2	'sHere	_	_	_	_	_	_	_	_

Thanks for the hint :grin:

Mar 21 '22 01:03 keggsmurph21

Oh but this is interesting: if I click the third word and enter "c" to merge left, it deletes the first word. So the merging functionality must be buggy.

f6865f1

Mar 21 '22 01:03 keggsmurph21

Thanks, pulled the update. Now I find that if I create several multiword tokens and then select one of them to split ("s"), it may split the wrong one.

Mar 21 '22 02:03 nschneid

ud-annotatrix ud-annotatrix copied to clipboard

Multiword token range sometimes being saved as HEAD

ud-annotatrix
ud-annotatrix copied to clipboard