kenlm
kenlm copied to clipboard
Language model can not be used in multiprocessing
Hi all,
When I use multiprocess like this:
` class LM_Decode(object):
def __init__(self, lm_path):
self.lm_model = kenlm.LanguageModel(lm_path)
def decode(self, sent):
lm_prob = list(self.lm_model.full_score(sent))
return lm_prob[-1][0]
lm_decoder = LM_Decode("lm_path.bin")
pool = multiprocessing.Pool(processes=4)
sent_list=['sent1', 'sent2', 'sent3', 'sent4']
pred = pool.map(func=lm_decoder.decode, iterable=sent_list) `
It gives the following error:
File "/media/tclwh2/tanglei/anaconda3/envs/tf1_12/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/media/tclwh2/tanglei/anaconda3/envs/tf1_12/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value File "/media/tclwh2/tanglei/anaconda3/envs/tf1_12/lib/python3.6/multiprocessing/pool.py", line 424, in _handle_tasks put(task) File "/media/tclwh2/tanglei/anaconda3/envs/tf1_12/lib/python3.6/multiprocessing/connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "/media/tclwh2/tanglei/anaconda3/envs/tf1_12/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) File "kenlm.pyx", line 258, in kenlm.Model.reduce (python/kenlm.cpp:3929) NameError: name '_kenlm' is not defined
What's wrong with it? How to fix it? Anyone can help me? Thanks a lot!
The underlying C++ code is threadsafe (and for that matter the mmap can share memory). I don't know enough about python though.
The underlying C++ code is threadsafe (and for that matter the mmap can share memory). I don't know enough about python though.
Thanks a lot for the reply, I found that it can run normally in Python3.7 and above but will give error in Python3.6 and below. Maybe the implement of multiprocessing in Python3.6 and Python3.7 are different.