BLKSerene

Results 56 comments of BLKSerene
trafficstars

I use spaCy to do word tokenization and sentence tokenization for all languages supported by spaCy in my project (which is a multi-lingual corpus processing and analyzing tool packaged and...

In my project, I need to fetch stop words of all languages provided by spaCy, so I have to use the `importlib` way with f-string and did not run into...

I propose to remove `tinydb` as an external dependency (as mentioned in #506). It seems to me that using a 3rd-party database library is an overkill for just querying a...

When working on #691, I found a problem relating to the version checking behavior of the corpus downloader. The downloader checks both the name and the version [here](https://github.com/PyThaiNLP/pythainlp/blob/976eb28306626b17a427d588b6b90a30e69eafb3/pythainlp/corpus/core.py#L404), so [later](https://github.com/PyThaiNLP/pythainlp/blob/976eb28306626b17a427d588b6b90a30e69eafb3/pythainlp/corpus/core.py#L464)...

@wannaphong I mean that since both the name and the version of the corpus is checked, the [else block](https://github.com/PyThaiNLP/pythainlp/blob/976eb28306626b17a427d588b6b90a30e69eafb3/pythainlp/corpus/core.py#L467) would never be reached, so the user will never be notified...

@wannaphong So what is the expected behavior? If the user should be notified to use `force = True` when newer versions of the corpus are available, I could work on...

@wannaphong Should be fixed in #692, please review the PR.

I have the same issue here, I 've found the problem is `python-crfsuite` only recently. It would be great if this pull request could be merged and released.

Any updates on this PR?

There are two UD Chinese corpora. Simplified Chinese: https://github.com/UniversalDependencies/UD_Chinese-GSDSimp Traditional Chinese: https://github.com/UniversalDependencies/UD_Chinese-GSD What are the requirements of the training data? And license?