pythainlp
pythainlp copied to clipboard
newmm has problem about " ์ "
Description
I try to tokenize text with "ทดสอบตัดคำภาษาไทยจอก์น"
.
Expected results
['ทดสอบ', 'ตัด', 'คำ', 'ภาษาไทย', 'จอก์', 'น']
Current results
['ทดสอบ', 'ตัด', 'คำ', 'ภาษาไทย', 'จอก', '์น']
Steps to reproduce
from pythainlp.tokenize import word_tokenize
word_tokenize("ทดสอบตัดคำภาษาไทยจอก์น")
Your environment
- PyThaiNLP version: 3.0.5
- Python version: 3.8
- Operating system and version (distro, 32/64-bit): Linux 64 bit
- More info (Docker, VM, etc.): Ubuntu 20.04 Docker