LightGBM icon indicating copy to clipboard operation
LightGBM copied to clipboard

[gpu] [python] LightGBMError: No OpenCL device found

Open aidiss opened this issue 3 years ago • 8 comments

Description

Reproducible example

Connect to localhost:8888 jupyter notebook

from lightgbm import LGBMClassifier
from sklearn.datasets import make_moons
model = LGBMClassifier(boosting_type='gbdt', num_leaves=31, max_depth=- 1, learning_rate=0.1, n_estimators=300, device = "gpu")
train, label = make_moons(n_samples=300000, shuffle=True, noise=0.3, random_state=None)
model.fit(train, label)

Results in

LightGBMError                             Traceback (most recent call last)
<ipython-input-1-3cadc7bec646> in <module>
      3 model = LGBMClassifier(boosting_type='gbdt', num_leaves=31, max_depth=- 1, learning_rate=0.1, n_estimators=300, device = "gpu")
      4 train, label = make_moons(n_samples=300000, shuffle=True, noise=0.3, random_state=None)
----> 5 model.fit(train, label)

Environment info

LightGBM version or commit hash: Docker version

Command(s) you used to install LightGBM

Run everything according to https://github.com/microsoft/LightGBM/tree/master/docker/gpu

mkdir lightgbm-docker
cd lightgbm-docker
wget https://raw.githubusercontent.com/Microsoft/LightGBM/master/docker/gpu/dockerfile.gpu
docker build -f dockerfile.gpu -t lightgbm-gpu .
nvidia-docker run --rm -d --name lightgbm-gpu -p 8888:8888 -v /home:/home lightgbm-gpu

Full traceback

---------------------------------------------------------------------------
LightGBMError                             Traceback (most recent call last)
<ipython-input-1-3cadc7bec646> in <module>
      3 model = LGBMClassifier(boosting_type='gbdt', num_leaves=31, max_depth=- 1, learning_rate=0.1, n_estimators=300, device = "gpu")
      4 train, label = make_moons(n_samples=300000, shuffle=True, noise=0.3, random_state=None)
----> 5 model.fit(train, label)

/opt/conda/envs/py3/lib/python3.8/site-packages/lightgbm/sklearn.py in fit(self, X, y, sample_weight, init_score, eval_set, eval_names, eval_sample_weight, eval_class_weight, eval_init_score, eval_metric, early_stopping_rounds, verbose, feature_name, categorical_feature, callbacks, init_model)
    888                     valid_sets[i] = (valid_x, self._le.transform(valid_y))
    889 
--> 890         super().fit(X, _y, sample_weight=sample_weight, init_score=init_score, eval_set=valid_sets,
    891                     eval_names=eval_names, eval_sample_weight=eval_sample_weight,
    892                     eval_class_weight=eval_class_weight, eval_init_score=eval_init_score,

/opt/conda/envs/py3/lib/python3.8/site-packages/lightgbm/sklearn.py in fit(self, X, y, sample_weight, init_score, group, eval_set, eval_names, eval_sample_weight, eval_class_weight, eval_init_score, eval_group, eval_metric, early_stopping_rounds, verbose, feature_name, categorical_feature, callbacks, init_model)
    681             init_model = init_model.booster_
    682 
--> 683         self._Booster = train(params, train_set,
    684                               self.n_estimators, valid_sets=valid_sets, valid_names=eval_names,
    685                               early_stopping_rounds=early_stopping_rounds,

/opt/conda/envs/py3/lib/python3.8/site-packages/lightgbm/engine.py in train(params, train_set, num_boost_round, valid_sets, valid_names, fobj, feval, init_model, feature_name, categorical_feature, early_stopping_rounds, evals_result, verbose_eval, learning_rates, keep_training_booster, callbacks)
    226     # construct booster
    227     try:
--> 228         booster = Booster(params=params, train_set=train_set)
    229         if is_valid_contain_train:
    230             booster.set_train_data_name(train_data_name)

/opt/conda/envs/py3/lib/python3.8/site-packages/lightgbm/basic.py in __init__(self, params, train_set, model_file, model_str, silent)
   2232             params_str = param_dict_to_str(params)
   2233             self.handle = ctypes.c_void_p()
-> 2234             _safe_call(_LIB.LGBM_BoosterCreate(
   2235                 train_set.handle,
   2236                 c_str(params_str),

/opt/conda/envs/py3/lib/python3.8/site-packages/lightgbm/basic.py in _safe_call(ret)
    108     """
    109     if ret != 0:
--> 110         raise LightGBMError(_LIB.LGBM_GetLastError().decode('utf-8'))
    111 
    112 

LightGBMError: No OpenCL device found

Additional Comments

I have seen similar issues that are closed as resolved. As I understand the solution was to add mkdir -p /etc/OpenCL/vendors && echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd. But this now is included into the Dockerfile.

aidiss avatar Aug 01 '21 17:08 aidiss

I also have this type of error. I tried to follow these guidelines: https://docs.docker.com/docker-for-windows/wsl/ However, I still did not manage to run docker LightGBM + GPU. I can use LightGBM + GPU without docker by installing the library locally, however, on docker, I still cannot use it.

I think my issue is related to drivers, either with windows pass-trough (e.g. https://stackoverflow.com/questions/49589229/is-gpu-pass-through-possible-with-docker-for-windows).

When I try to run:

docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

I receive this error:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.

Can you clarify what kind of steps I should follow to use docker LightGBM + GPU on my windows 10 Pro?

valentasgruzauskas avatar Aug 03 '21 14:08 valentasgruzauskas

@aidiss @valentasgruzauskas Are you both using WSL?

StrikerRUS avatar Aug 03 '21 19:08 StrikerRUS

@StrikerRUS no, I am on linux

aidiss avatar Aug 03 '21 19:08 aidiss

@aidiss

I have seen similar issues that are closed as resolved. As I understand the solution was to add mkdir -p /etc/OpenCL/vendors && echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd. But this now is included into the Dockerfile.

I was encountering the same issue as you and was looking at these same solutions.

But it turns out that for me libnvidia-opencl.so.1 is no longer present anywhere within the filesystem when running the Dockerfile at https://github.com/microsoft/LightGBM/blob/master/docker/gpu/dockerfile.gpu for some reason. Installing nvidia-opencl-icd-375 adds it in. Yet running clinfo it still shows 0 devices are present.

RyanVereque avatar Aug 25 '21 11:08 RyanVereque

I have the same issue, I'm using WSL.

eXTure avatar Nov 15 '21 09:11 eXTure

I haven't tried running sample code since my last comment except just now, but running it now in the latest commit it doesn't give this error anymore. Haven't checked why/how though!

RyanVereque avatar Nov 19 '21 14:11 RyanVereque

It was working good for few months. Now it started receiving same error.

aidiss avatar Jul 01 '22 14:07 aidiss

@aidiss Thanks for using LightGBM. So now even with mkdir -p /etc/OpenCL/vendors && echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd the problem still occurs?

shiyu1994 avatar Jul 12 '22 07:07 shiyu1994

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!

github-actions[bot] avatar Aug 23 '22 04:08 github-actions[bot]

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

github-actions[bot] avatar Aug 15 '23 20:08 github-actions[bot]