oneTBB icon indicating copy to clipboard operation
oneTBB copied to clipboard

TBB4PY patching not working with python joblib nested parallelism problem

Open goplanid opened this issue 1 year ago • 3 comments

TBB patch not working with below python joblib nested parallelism example. used intel synthetic dataset. code gets stuck with threading joblib backend. Kindly advise.

from sklearn.cluster import KMeans
import time
import numpy as np
import joblib
from joblib import Parallel, delayed, parallel_backend

path="../dask_study/Intel_Bench_Datasets/Intel_Bench_Datasets/datasets/"
X = np.load(path+'kmeans/X_blob_10000000x5_50.npy')
y = np.load(path+'kmeans/y_blob_10000000x5_50.npy')

def kmeans(x):
    #kmeans=KMeans(n_clusters=5, random_state=123,copy_x= False)
    kmeans=KMeans(n_clusters=5, random_state=123)
    kmeans.fit(x)
    return kmeans

#X_copy=np.copy(X)
data_chunks=np.array_split(X, 16)
st = time.time()
with joblib.parallel_backend('threading'):
    results=Parallel(n_jobs=-1)(delayed(kmeans)(chunk) for chunk in data_chunks)
    #KMeans(n_clusters=5, random_state=123).fit(X,y)
print(f"Time Taken to run Kmeans: {time.time()-st}")

goplanid avatar May 31 '23 11:05 goplanid

@goplanid, which command are you using for TBB4PY patching? Have you tried disabling inter-process coordination by not specifying the --ipc option?

dnmokhov avatar Jun 02 '23 00:06 dnmokhov

I am just using -m tbb. Not using --ipc.

goplanid avatar Jun 02 '23 03:06 goplanid

Thank you. I could not reproduce the hang using the current master with a random dataset.

Could you please share your X_blob_10000000x5_50.npy and y_blob_10000000x5_50.npy?

dnmokhov avatar Jun 02 '23 17:06 dnmokhov

@goplanid is this issue still relevant?

nofuturre avatar Jul 11 '24 13:07 nofuturre

If anyone encounter this issue in the future please open new issue with a link to this one

nofuturre avatar Jul 23 '24 07:07 nofuturre