pythainlp icon indicating copy to clipboard operation
pythainlp copied to clipboard

Thai Natural Language Processing in Python.

Results 71 pythainlp issues
Sort by recently updated
recently updated
newest added
trafficstars

## Description I try to tokenize text with `"ทดสอบตัดคำภาษาไทยจอก์น"`. ## Expected results `['ทดสอบ', 'ตัด', 'คำ', 'ภาษาไทย', 'จอก์', 'น']` ## Current results `['ทดสอบ', 'ตัด', 'คำ', 'ภาษาไทย', 'จอก', '์น']` ## Steps to...

bug

In the next time, I think we should porting pythainlp model to onnx model. For onnx model, It's model standard to other framework and It can use OS framework to...

refactoring
Hacktoberfest

After `python-crfsuite` fixed python 3.10 problem but they doesn't release new version to PyPI. https://github.com/scrapinghub/python-crfsuite/issues/139 I think we should change all python-crfsuite models to Pytorch models.

refactoring
Hacktoberfest

## Description I've been contacted via email that AttaCut (possibly other tokenizers as well) cannot cope well when encountering texts like below ``` - 'เจอกันตอน 17.00น.' - actual: ['เจอ', 'กัน',...

## Trie - [ ] Add trie for OOV words - @korakot ## Dependency parsing - [x] Add dependency parsing to PyThaiNLP #606 [WIP]

documentation

After NECTEC released Blackboard Treebank, We want to add dependency parsing from Blackboard Treebank to PyThaiNLP 3.0. Facebook: https://web.facebook.com/dancearmy/posts/10158423653343284 bitbucket: https://bitbucket.org/kaamanita/blackboard-treebank/

enhancement

## Detailed description From [Thai NLP Meetup #7](https://web.facebook.com/AIResearch.in.th/videos/1474022956330608), II get feedback from the user. They want to builder tools in pythainlp for using own model in pythainlp. ## Context benefit:...

enhancement

## Description `pythainlp.util.collate()` results a wrong ordering, as current implementation ignores tone marks and symbols in the ordering. Try this code: ```python from pythainlp.util import collate collate(["ก้วย", "ก๋วย", "ก่วย", "กวย",...

bug
help wanted
Hacktoberfest

## Detailed description From #614, we now have a way to produce misspells for Thai and English text. One usecase of the module would be to simulate out-of-distribution (OOD) datasets...

enhancement
help wanted
Hacktoberfest

We want to write new IPA implementation for `pythainlp.transliterate.transliterate`. - This is to replace [epitran](https://github.com/dmort27/epitran/) and remove dependency on [marisa-trie](https://github.com/pytries/marisa-trie). - In some situations, marisa-trie has a portability issue with...

enhancement
help wanted
Hacktoberfest