kenlm icon indicating copy to clipboard operation
kenlm copied to clipboard

Language model can not be used in multiprocessing

Open yjiangling opened this issue 3 years ago • 2 comments

Hi all,

    When I use multiprocess like this:

` class LM_Decode(object):

def __init__(self, lm_path):
	self.lm_model = kenlm.LanguageModel(lm_path)

def decode(self, sent):
	lm_prob = list(self.lm_model.full_score(sent))
	return lm_prob[-1][0]

lm_decoder = LM_Decode("lm_path.bin")

pool = multiprocessing.Pool(processes=4)

sent_list=['sent1', 'sent2', 'sent3', 'sent4']

pred = pool.map(func=lm_decoder.decode, iterable=sent_list) `

It gives the following error:

File "/media/tclwh2/tanglei/anaconda3/envs/tf1_12/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/media/tclwh2/tanglei/anaconda3/envs/tf1_12/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value File "/media/tclwh2/tanglei/anaconda3/envs/tf1_12/lib/python3.6/multiprocessing/pool.py", line 424, in _handle_tasks put(task) File "/media/tclwh2/tanglei/anaconda3/envs/tf1_12/lib/python3.6/multiprocessing/connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "/media/tclwh2/tanglei/anaconda3/envs/tf1_12/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) File "kenlm.pyx", line 258, in kenlm.Model.reduce (python/kenlm.cpp:3929) NameError: name '_kenlm' is not defined

What's wrong with it? How to fix it? Anyone can help me? Thanks a lot!

yjiangling avatar Dec 24 '21 01:12 yjiangling

The underlying C++ code is threadsafe (and for that matter the mmap can share memory). I don't know enough about python though.

kpu avatar Jan 03 '22 00:01 kpu

The underlying C++ code is threadsafe (and for that matter the mmap can share memory). I don't know enough about python though.

Thanks a lot for the reply, I found that it can run normally in Python3.7 and above but will give error in Python3.6 and below. Maybe the implement of multiprocessing in Python3.6 and Python3.7 are different.

yjiangling avatar Jan 10 '22 02:01 yjiangling