BLKSerene

Results 45 comments of BLKSerene

The `conllu` package should suffice for parsing UD corpora: https://github.com/EmilStenstrom/conllu

@arademaker The problem concerns with the word tokenizers, not the Punkt sentence tokenizer. Are they based on the same algorithms?

In my use cases, I do not use `nltk.word_tokenize` (which would call `nltk.sent_tokenize` first). I call `TreebankWordTokenizer` and `NLTKWordTokenizer` directly for word tokenization task (sentence tokenization, if needed, is handled...

@wannaphong There are no spaces between Thai words, only between sentences, am I right? I do not speak Thai, so I can't give examples here. But when Thai words are...

@wannaphong Is `clause_tokenize` required to get the correct detokenized string? If so, perhaps `clause_tokenize` could be called implicitly inside `word_detokenize`?

Since I do not speak Thai, I'm a bit confused about some points. The input of `word_detokenize` could be either list of tokens (strings) or list of sub-lists of strings...

And, how are English words (and words in other Indo-European languages) handled in `word_detokenize` (spaces added between Thai and English words or not)?

There is a bug for the macOS version which should be fixed in the next version. You may use the Windows version now.

Fixed in [2.3.0](https://github.com/BLKSerene/Wordless/releases/tag/2.3.0), please take a try.

高分屏的字体适配问题后续版本会逐步尝试解决