spaCy-Thai
spaCy-Thai copied to clipboard
Can you train Thai Treebanks Dataset?
I found .
thtb_orchidpp.txt
file is a treebank dataset from orchid corpus but it is not CoNLLU.
Umm... The dataset seems something like phrase structure. For example, the first line
[S [NP [FIXN การ]] [VP [VACT ประชุม] [PP [RPRE ทาง] [NP [NCMN วิชาการ] [PUNC <space>] [NP [NCMN ครั้ง] [DONM ที่ 1]]]]]]
denotes the phrase tree as shown below.
I trained spaCy-Thai
with dependency trees, which are far different from the phrase tree...
# text = การประชุมทางวิชาการ ครั้งที่ 1
1 การ _ PART FIXN _ 0 root _ SpaceAfter=No
2 ประชุม _ VERB VACT _ 1 acl _ SpaceAfter=No
3 ทาง _ ADP RPRE _ 4 case _ SpaceAfter=No
4 วิชาการ _ NOUN NCMN _ 2 obl _ _
5 ครั้ง _ NOUN NCMN _ 1 list _ SpaceAfter=No
6 ที่ _ DET PREL _ 7 det _ _
7 1 _ NUM DCNM _ 5 nummod _ SpaceAfter=No
On the other hand the dependency tree is visualized as:
Well, how do we convert the phrase structure and the dependency tree into one another, @wannaphong?
Well, how do we convert the phrase structure and the dependency tree into one another, @wannaphong?
Sorry, I do not know because it is beyond the scope of my expertise. I think @korakot should help with this.
It's possible in theory. A constituency tree can be converted to a dependency tree with no ambiguity. For example
VP = V + NP can be converted to V -[dobj]-> NP
But there's no package library to do it for Thai (or even many other languages). You may need to convert them one by one.
You can search google to find some papers and 1 github for this. https://www.google.com/search?q=convert+constituency+tree+to+dependency+tree
Korakot
On Mon, Dec 14, 2020 at 10:49 PM Wannaphong Phatthiyaphaibun < [email protected]> wrote:
Well, how do we convert the phrase structure and the dependency tree into one another, @wannaphong https://github.com/wannaphong?
Sorry, I do not know because it is beyond the scope of my expertise. I think @korakot https://github.com/korakot should help with this.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/KoichiYasuoka/spaCy-Thai/issues/1#issuecomment-744529368, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYCNPVZLBRK2KFGU5MI5YLSUYXX7ANCNFSM4UZWYJOQ .
VP = V + NP can be converted to V -[dobj]-> NP
Oh, it looks very nice. But I'm vague that S = NP + VP
can be converted into NP <-[nsubj]- VP
or NP <-[vocative]- VP
or NP -[acl]-> VP
...
For S = NP + VP It needs to look inside of NP and VP, so that we can know which [rel] it is. It's not ambiguous, though. You need to do a few if-then cases on PoS and word groups. It's a bit labor-intensive to list all cases.
Korakot
On Mon, Dec 21, 2020 at 3:39 PM Koichi Yasuoka [email protected] wrote:
VP = V + NP can be converted to V -[dobj]-> NP
Oh, it looks very nice. But I'm vague that S = NP + VP can be converted into NP <-[nsubj]- VP or NP <-[vocative]- VP or NP -[acl]-> VP...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/KoichiYasuoka/spaCy-Thai/issues/1#issuecomment-748847333, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYCNPWOVMOQ2IIGOPWE6ATSV4CTHANCNFSM4UZWYJOQ .