udpipe
udpipe copied to clipboard
How to send multiple tokenized tokens in multiple sentences in multiple paragraphs to UDPIPE to parse?
UDPipe Chinese model is so bad at tokenization. I need to manaually seperate a doc into multiple paragraphs, then iteratively seperate each paragraph into multiple sentences, then iteratively tokenize each sentence with jiebaR. Then I need to feed the result into udpipe to go on tagging and parsing. I read the official documentation and tried a lot, no luck. I'm not familiar with R.
Many thanks!
If your text is already tokenized: see https://cran.r-project.org/web/packages/udpipe/vignettes/udpipe-annotation.html