thundergbm icon indicating copy to clipboard operation
thundergbm copied to clipboard

Not thread safe?

Open Tripton opened this issue 5 years ago • 6 comments

Hello,

in my current use case it would be cool if I can use multi threading/multi processing, cause I have a lot of calculations and my GPU could handle it.

However, I get strange results by using threading. Here is a small script to reproduce

from thundergbm import TGBMRegressor
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error
from multiprocessing.pool import ThreadPool as Pool
import functools
import numpy as np


def main():
    print("Without parallel threads: " + str(calc(2, 1)))
    print("With parallel threads: " + str(calc(2, 2)))

def calc(num_repeats, pool_size):
    x, y = load_boston(return_X_y=True)
    x = np.repeat(x, 1000, axis=0)
    y = np.repeat(y, 1000)

    x = np.asarray([x]*num_repeats)
    y = np.asarray([y]*num_repeats)

    pool = Pool(pool_size)
    func = functools.partial(fit_gbdt, x=x, y=y)
    results = pool.map(func, range(num_repeats))
    return results

def fit_gbdt(idx, x, y):
    clf = TGBMRegressor(verbose=0)
    clf.fit(x[idx], y[idx])
    y_pred = clf.predict(x[idx])
    rmse = (mean_squared_error(y[idx], y_pred)**(1/2))
    return rmse

if __name__ == '__main__':
    main()

Sometimes I get the error:

2019-10-14 10:48:22,422 FATAL [default] Check failed: [error == cudaSuccess]  an illegal memory access was encountered
2019-10-14 10:48:22,426 WARNING [default] Aborting application. Reason: Fatal log at [/thundergbm/include\thundergbm/util/device_lambda.cuh:49]
2019-10-14 10:48:22,434 FATAL [default] Check failed: [error == cudaSuccess]  an illegal memory access was encountered

and sometimes bad results:

Without parallel threads: [0.011103539879039557, 0.011174528160149052]
With parallel threads: [0.04638805412265755, 4.690559078455652]

Multi processing does work, but only if I'm not returning a TGBM instance. Returning the instance would be the best solution but dosn't work at all cause TGBM is not picklable.

I'm using Windows 10 with CUDA 10.

From my experience it's sometimes hard to do multithreading with CUDA (=> tensorflow) but multiprocessing should be okay if the object is pickelable. Maybe it is possible to make tgbm pickelable or find the bug which causes multi threading to crash.

Many thanks!

Tripton avatar Oct 14 '19 09:10 Tripton

Thanks. We will look into this problem and get back to you if we have any update.

zeyiwen avatar Oct 15 '19 01:10 zeyiwen

@Tripton Just some quick update. You may use the exact method for the tree_method option. We are working hard to locate the bug, which appears to be quite challenging due to massive parallelism in ThunderGBM with multithreading outside of it.

zeyiwen avatar Oct 23 '19 10:10 zeyiwen

Hi @Tripton, thanks for your report. ThunderGBM is thread-safe now. I have run a dozen times of your code on our server, and there is no error shown. You should reinstall the library and have a try. Thanks.

Kurt-Liuhf avatar Oct 25 '19 07:10 Kurt-Liuhf

this issue should be solved now. so we like to close it.

zeyiwen avatar Oct 28 '19 03:10 zeyiwen

I would like to reopen this issue. I made some small modifications to the above code:

from thundergbm import TGBMRegressor   
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error
from multiprocessing.pool import ThreadPool as Pool
import functools
import numpy as np

def main():
    print("Without parallel threads and 1 gpu: " + str(calc(2, 1, 1)))
    print("With parallel threads and 1 gpu: " + str(calc(2, 2, 1)))
    print("Without parallel threads and 2 gpu: " + str(calc(2, 1, 2)))
    print("With parallel threads and 2 gpu: " + str(calc(2, 2, 2)))

def calc(num_repeats, pool_size, n_gpus):
    x, y = load_boston(return_X_y=True)
    x = np.repeat(x, 1000, axis=0)
    y = np.repeat(y, 1000)

    x = np.asarray([x]*num_repeats)
    y = np.asarray([y]*num_repeats)

    pool = Pool(pool_size)
    func = functools.partial(fit_gbdt, x=x, y=y, n_gpus=n_gpus)
    results = pool.map(func, range(num_repeats))
    return results

def fit_gbdt(idx, x, y, n_gpus):
    clf = TGBMRegressor(verbose=0, n_gpus=n_gpus)
    clf.fit(x[idx], y[idx])
    y_pred = clf.predict(x[idx])
    rmse = (mean_squared_error(y[idx], y_pred)**(1/2))
    return rmse

if __name__ == '__main__':
    main()

Now sometimes I see output like this:

In [2]: %run test_tgbm.py                                                                                                                                                           
Without parallel threads and 1 gpu: [0.011102704273042477, 0.01117481052674395]
With parallel threads and 1 gpu: [0.01117491826081946, 0.011103542490388574]
Without parallel threads and 2 gpu: [0.01239784807141135, 0.012399129722859907]
Segmentation fault (core dumped)

civilinformer avatar Jun 12 '20 17:06 civilinformer

Hi @civilinformer, thank you for your feedback. Your test results show that there might still be some bugs of thread-safety on ThunderGBM. We will conduct further tests on the ThunderGBM and fix the thread-safety issue in a better way. And we will get back to you if there is any update. Thank you.

Kurt-Liuhf avatar Jun 13 '20 05:06 Kurt-Liuhf