pythainlp icon indicating copy to clipboard operation
pythainlp copied to clipboard

ModuleNotFoundError when calling `crfcut` engine in `sent_tokenize` function

Open pavaris-pm opened this issue 1 year ago • 2 comments

i've try the crfcut engine in sent_tokenize function in stable release version of PyThaiNLP via

pip install --upgrade pythainlp

this is what i expected

sent_tokenize(sentence_1, engine="crfcut")
# output: ['ฉันไปประชุมเมื่อวันที่ 11 มีนาคม']

however, i got this as an output instead

sent_tokenize(sentence_1, engine="crfcut")

# ModuleNotFoundError: No module named 'pycrfsuite'

since it is a missing package problem, it can be solved by pip install python-crfsuite in order to make it compatible to be used. However, is it better to fix it so that the user has no need to take an extra step to install crfsuite everytime they want to use an engine, or we can just leave it as usual here. What do you think ?

pavaris-pm avatar Nov 07 '23 11:11 pavaris-pm

python-crfsuite is often python problem when python was released new version. You can see #655. We doesn't add python-crfsuite to the dependencies list.

wannaphong avatar Nov 08 '23 12:11 wannaphong

I looking new model to removed all crfsuite model but these models are quite efficient and therefore not worth replacing. Deep learning model are not much better.

wannaphong avatar Nov 08 '23 12:11 wannaphong