langid.py icon indicating copy to clipboard operation
langid.py copied to clipboard

multiprocessing?

Open ianbstewart opened this issue 5 years ago • 4 comments

Hello,

It seems that langid uses multiprocessing under the hood to make classification faster. Is there any way in Python to force langid to use a single process (turn off multiprocessing)?

ianbstewart avatar Sep 04 '19 23:09 ianbstewart

This problem still persists, even with a simple script like the one below. Please advise on how to turn off multiprocessing.

import numpy as np
np.random.seed(123)

lang_id_model = LanguageIdentifier.from_modelstring(model, norm_probs=True)
# generate random text
alpha = list(string.ascii_lowercase)
N = 100000
M = 40
# generate text of fixed length
txt = [''.join(np.random.choice(alpha, M, replace=True)) for _ in range(N)]

# tag text
lang = []
for txt_i in txt:
    lang_i = lang_id_model.classify(txt_i)
    lang.append(lang_i)
print(len(lang))```

ianbstewart avatar Apr 21 '20 15:04 ianbstewart

Update: I forced langid to use fewer processes by setting a global variable before running the python script.

MAX_CPU_USE=20 export OMP_NUM_THREADS=$MAX_CPU_USE

ianbstewart avatar Apr 23 '20 16:04 ianbstewart

This lib is just taking so much cpu, 8 cores out of 8 cores. I mean do not get me wrong, what you did is impressive, and I do not mean to be a critic or anything but taking up 8 cores out of 8 cores.

goors avatar Dec 09 '23 17:12 goors

Update: I forced langid to use fewer processes by setting a global variable before running the python script.

MAX_CPU_USE=20 export OMP_NUM_THREADS=$MAX_CPU_USE

Is this going to slow classification down? What is the impact here?

goors avatar Dec 09 '23 17:12 goors