langid.py
langid.py copied to clipboard
multiprocessing?
Hello,
It seems that langid
uses multiprocessing under the hood to make classification faster. Is there any way in Python to force langid
to use a single process (turn off multiprocessing)?
This problem still persists, even with a simple script like the one below. Please advise on how to turn off multiprocessing.
import numpy as np
np.random.seed(123)
lang_id_model = LanguageIdentifier.from_modelstring(model, norm_probs=True)
# generate random text
alpha = list(string.ascii_lowercase)
N = 100000
M = 40
# generate text of fixed length
txt = [''.join(np.random.choice(alpha, M, replace=True)) for _ in range(N)]
# tag text
lang = []
for txt_i in txt:
lang_i = lang_id_model.classify(txt_i)
lang.append(lang_i)
print(len(lang))```
Update: I forced langid
to use fewer processes by setting a global variable before running the python script.
MAX_CPU_USE=20
export OMP_NUM_THREADS=$MAX_CPU_USE
This lib is just taking so much cpu, 8 cores out of 8 cores. I mean do not get me wrong, what you did is impressive, and I do not mean to be a critic or anything but taking up 8 cores out of 8 cores.
Update: I forced
langid
to use fewer processes by setting a global variable before running the python script.
MAX_CPU_USE=20
export OMP_NUM_THREADS=$MAX_CPU_USE
Is this going to slow classification down? What is the impact here?