[BUG] Encountering raft::cuda_error with cuML's RandomForestClassifier on GPU. (cp.cuda.Device(2).use())
Hi,
I am encountering an issue when selecting a GPU using cp.cuda.Device(2).use(). When I do not specify the GPU device, the script runs without errors.
Description:
I am using RAPIDS 24.06, CUDA 12.4, and Python 3.9. I encounter a raft::cuda_error when using cuml's RandomForestClassifier with GPU devices.
Code:
import cupy as cp
import os
import cudf
import cuml
import pandas as pd
from sklearn import model_selection
from cuml import datasets
import dask
from dask.distributed import Client, wait
from dask_cuda import LocalCUDACluster
from dask.utils import parse_bytes
from numba import cuda
import dask_cudf
from cuml.ensemble import RandomForestClassifier as cuRFC
from cuml import ForestInference
import joblib
from tqdm import tqdm
from scipy import stats
from sklearn import metrics
import pickle
from collections import Counter
import random
import shutil
import time
import gc
import warnings
import numpy as np
import multiprocessing
cp.cuda.Device(2).use()
model_parameter = cuRFC(n_estimators=500, max_features='log2', random_state=seed)
Error Message:
CURFC
/home1/rhlin/anaconda3/envs/rapids-24.06/lib/python3.11/site-packages/cuml/internals/api_decorators.py:344: UserWarning: For reproducible results in Random Forest Classifier or for almost reproducible results in Random Forest Regressor, n_streams=1 is recommended. If n_streams is > 1, results may vary due to stream/thread timing differences, even when random_state is set
return func(**kwargs)
terminate called after throwing an instance of 'raft::cuda_error'
what(): CUDA error encountered at: file=/opt/conda/conda-bld/work/cpp/src/decisiontree/batched-levelalgo/builder.cuh line=331: call='cudaMemsetAsync(done_count, 0, sizeof(int) * max_batch * n_col_blks, builder_stream)', Reason=cudaErrorInvalidValue:invalid argument
Obtained 7 stack frames
#1 in /home1/rhlin/anaconda3/envs/rapids-24.06/lib/python3.11/site-packages/cuml/internals/../../../../libcuml++.so: raft::cuda_error::cuda_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) +0x5a [0x767aa52af28a]
#2 in /home1/rhlin/anaconda3/envs/rapids-24.06/lib/python3.11/site-packages/cuml/internals/../../../../libcuml++.so: ML::DT::Builder<ML::DT::GiniObjectiveFunction<float, int, int> >::assignWorkspace(char*, char*) +0x308 [0x767aa5dc13e8]
#3 in /home1/rhlin/anaconda3/envs/rapids-24.06/lib/python3.11/site-packages/cuml/internals/../../../../libcuml++.so: ML::DT::Builder<ML::DT::GiniObjectiveFunction<float, int, int> >::Builder(raft::handle_t const&, CUstream_st*, int, unsigned long, ML::DT::DecisionTreeParams const&, float const*, int const*, int, int, rmm::device_uvector<int>*, int, ML::DT::Quantiles<float, int> const&) +0x2fc [0x767aa5dc19cc]
#4 in /home1/rhlin/anaconda3/envs/rapids-24.06/lib/python3.11/site-packages/cuml/internals/../../../../libcuml++.so(+0xdf021f) [0x767aa5df021f]
#5 in /home1/rhlin/anaconda3/envs/rapids-24.06/lib/python3.11/site-packages/sklearn/utils/../../../../libgomp.so.1(+0x18f09) [0x767abecbbf09]
#6 in /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x767ed8294ac3]
#7 in /lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x767ed8326850]
/home1/rhlin/anaconda3/envs/rapids-24.06/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 24 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Aborted (core dumped)
Any suggestions for resolving this issue?
Thank you so much.
RH
Thanks for the issue @m946107011, interestingly enough I have not used CuPy's cuda device selection mechanisms with RAPIDS in general, and it is untested particularly with cuML. I would recommend instead using the environment variable CUDA_VISIBLE_DEVICES. Are you planning to use multugpu capabilities? Asking since I saw the multiple dask imports in the code you shared.
I am getting the same error, with Rapids 25.08 (dev version).
Reproducer (make sure to run it on a machine with 2 or more GPUs):
import cupy as cp
from cuml.ensemble import RandomForestClassifier as cuRFC
with cp.cuda.Device(1):
X = cp.random.normal(size=(10, 4)).astype(cp.float32)
y = cp.asarray([0, 1] * 5, dtype=cp.int32)
cuml_model = cuRFC(max_features=1.0, n_bins=8, n_estimators=2)
cuml_model.fit(X, y)
Error:
terminate called after throwing an instance of 'raft::cuda_error'
what(): CUDA error encountered at: file=/home/phcho/Desktop/cuml/cpp/src/decisiontree/batched-levelalgo/builder.cuh line=331: call='cudaMemsetAsync(done_count, 0, sizeof(int) * max_batch * n_col_blks, builder_stream)', Reason=cudaErrorInvalidValue:invalid argument
Obtained 7 stack frames
#1 in /home/phcho/miniforge3/envs/cuml_dev/lib/libcuml++.so: raft::cuda_error::cuda_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) +0x9d [0x7fc2ef1bc34d]
#2 in /home/phcho/miniforge3/envs/cuml_dev/lib/libcuml++.so: ML::DT::Builder<ML::DT::GiniObjectiveFunction<float, int, int> >::assignWorkspace(char*, char*) +0x2ea [0x7fc2ef67aa1a]
#3 in /home/phcho/miniforge3/envs/cuml_dev/lib/libcuml++.so: ML::DT::Builder<ML::DT::GiniObjectiveFunction<float, int, int> >::Builder(raft::handle_t const&, CUstream_st*, int, unsigned long, ML::DT::DecisionTreeParams const&, float const*, int const*, int, int, rmm::device_uvector<int>*, int, ML::DT::Quantiles<float, int> const&) +0x2d4 [0x7fc2ef67af84]
#4 in /home/phcho/miniforge3/envs/cuml_dev/lib/libcuml++.so(+0x6a06e1) [0x7fc2ef6a06e1]
#5 in /home/phcho/miniforge3/envs/cuml_dev/lib/python3.13/site-packages/sklearn/utils/../../../../libgomp.so.1(+0x19ec4) [0x7fc2f35bcec4]
#6 in /lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7fc3d0a9caa4]
#7 in /lib/x86_64-linux-gnu/libc.so.6(+0x129c3c) [0x7fc3d0b29c3c]
Aborted (core dumped)
Note. The error disappears when I add the argument n_streams=1 to the constructor.
Is this bug still present in the 25.10 nightlies?
Yes, this bug still exists, the reproducer above still is valid.