pythainlp
pythainlp copied to clipboard
Thai Natural Language Processing in Python.
### Description There are hundreds of warnings like this during unit test: ``` 2023-12-11:03:40:47 WARNING [gensim.models.keyedvectors:1909] duplicate word 'ต่าง' in word2vec file, ignoring all but first ``` ### Expected results...
### Description macOS test config matrix.os = "self-hosted" always failed. [.github/workflows/macos-test.yml ](https://github.com/PyThaiNLP/pythainlp/blob/dev/.github/workflows/macos-test.yml) ### Expected results If nothing is wrong with the code itself, the test should pass. Can compare this...
i've try the `crfcut` engine in `sent_tokenize` function in stable release version of PyThaiNLP via ```cli pip install --upgrade pythainlp ``` this is what i expected ```python sent_tokenize(sentence_1, engine="crfcut") #...
## Detailed description Add test with strings from the [Big List of Naughty Strings](https://github.com/minimaxir/big-list-of-naughty-strings), to test robustness of the library. ## Context The Big List of Naughty String is "an...
I tried newmm-safe engine but it gave inconsistent results. It sometimes tokenized correctly but sometimes not. ## Description Example: "ในฐานข้อมูลกฎหมายของเว็บไซต์ ทส. ข้อมูลและทรัพยากร ข้อมูลกฎหมายว่าด้วยป่าชุมชน CSV downloads กฎหมายแม่บท และกฎหมายลำดับรอง ของพระราชบัญญัติป่าชุมชน พ.ศ. 2562..."...
Today, PyThaiNLP use thai2fit with fastai v1. It's very old for fastai. It needs to porting Thai2fit from fastai v1 to fastai v2.
Hi all, was wondering if you guys have any method to exclude some words during the translation.
## Description Cannot build docker image with m1 macos ## Expected results Docker build succesfully ## Current results ``` [+] Building 76.5s (9/10) => [internal] load build definition from Dockerfile...
### Description In order to use free accelerator I'm trying to run it on https://kaggle.com with GPU T4x2 config but I've got load_model error, not enough memory. Could you help...