pythainlp icon indicating copy to clipboard operation
pythainlp copied to clipboard

newmm has problem about " ์ "

Open wannaphong opened this issue 2 years ago • 0 comments

Description

I try to tokenize text with "ทดสอบตัดคำภาษาไทยจอก์น".

Expected results

['ทดสอบ', 'ตัด', 'คำ', 'ภาษาไทย', 'จอก์', 'น']

Current results

['ทดสอบ', 'ตัด', 'คำ', 'ภาษาไทย', 'จอก', '์น']

Steps to reproduce

from pythainlp.tokenize import word_tokenize
word_tokenize("ทดสอบตัดคำภาษาไทยจอก์น")

Your environment

  • PyThaiNLP version: 3.0.5
  • Python version: 3.8
  • Operating system and version (distro, 32/64-bit): Linux 64 bit
  • More info (Docker, VM, etc.): Ubuntu 20.04 Docker

wannaphong avatar Apr 22 '22 09:04 wannaphong