[CUDA] Crash when using device_type=cuda
Description
I'm trying to use LightGBM on a CUDA multi GPU NVidia V100 system and when device_type is set to cuda I'm getting a segmentation fault, if device_type=gpu it works fine. I'm using latest checkout from master build of LightGBM.
gdb) where
#0 0x00000ede46d7cfc9 in LightGBM::CUDARegressionObjectiveInterface<LightGBM::RegressionL2loss>::Init(LightGBM::Metadata const&, int) () from /usr/local/lib/lib_lightgbm.so
#1 0x00000ede46466302 in LightGBM::Booster::CreateObjectiveAndMetrics (this=0xedd2d9c2800) at /mnt/slowstore/pub/LightGBM/src/c_api.cpp:213
#2 0x00000ede4643b33f in LightGBM::Booster::Booster (this=0xedd2d9c2800, train_data=0xedd2d1e4680, parameters=0xedd30d4a900 "boosting=gbdt objective=regression gpu_use_dp=false tree_learner=data max_bin=255 num_leaves=256 min_data_in_leaf=100 learning_rate=0.01 num_iterations=5000 feature_fraction=0.8 bagging_fraction=0.8 b"...) at /mnt/slowstore/pub/LightGBM/src/c_api.cpp:183
#3 LGBM_BoosterCreate (train_data=0xedd2d1e4680, parameters=0xedd30d4a900 "boosting=gbdt objective=regression gpu_use_dp=false tree_learner=data max_bin=255 num_leaves=256 min_data_in_leaf=100 learning_rate=0.01 num_iterations=5000 feature_fraction=0.8 bagging_fraction=0.8 b"..., out=0x7ffc34df6c28) at /mnt/slowstore/pub/LightGBM/src/c_api.cpp:1944
#4 0x00000ede996a988d in svr::kernel::kernel_gbm<double>::init (this=this@entry=0xedd3263fcd0, X_t=..., Y=...) at /usr/include/c++/14/bits/basic_string.h:227
#5 0x00000ede99825dd2 in _ZN3svr9datamodel9OnlineSVR4tuneEv._omp_fn.0(void) () at /mnt/faststore/repo/tempus-core/SVRRoot/OnlineSVR/src/onlinesvr_tune_fast.cpp:145
(gdb) list
69 in ./nptl/pthread_mutex_trylock.c
(gdb) up
#1 0x00000ede46466302 in LightGBM::Booster::CreateObjectiveAndMetrics (this=0xedd2d9c2800) at /mnt/slowstore/pub/LightGBM/src/c_api.cpp:213
213 objective_fun_->Init(train_data_->metadata(), train_data_->num_data());
(gdb) list -10
198 boosting_->MergeFrom(other->boosting_.get());
199 }
200
201 ~Booster() {
202 }
203
204 void CreateObjectiveAndMetrics() {
205 // create objective function
206 objective_fun_.reset(ObjectiveFunction::CreateObjectiveFunction(config_.objective,
207 config_));
Parameters string is
s << "boosting=gbdt objective=regression gpu_use_dp=false tree_learner=data max_bin=" LGBM_MAXBIN " num_leaves=256 min_data_in_leaf=100 learning_rate=" << PROPS.get_k_learn_rate() << " num_iterations=" << PROPS.get_k_epochs() <<
" feature_fraction=0.8 bagging_fraction=0.8 bagging_freq=5 metric=l2 save_binary=true use_missing=false force_col_wise=true num_threads=" << C_n_cpu << " device_type=cuda num_gpu=" << common::gpu_handler_1::get().get_gpu_devices_count();
Reproducible example
Environment info
LightGBM version or commit hash:
Command(s) you used to install LightGBM
shell
20250705-05:27:46] zarko@tempus:/mnt/faststore/repo/tempus-core/build$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 24.10
Release: 24.10
Codename: oracular
nvidia-smi
Sat Jul 5 05:28:10 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.51.03 Driver Version: 575.51.03 CUDA Version: 12.9 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla V100-FHHL-16GB On | 00000000:03:00.0 Off | 0 |
| N/A 36C P0 24W / 100W | 0MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Tesla V100-FHHL-16GB On | 00000000:04:00.0 Off | 0 |
| N/A 35C P0 22W / 100W | 0MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 Tesla V100-FHHL-16GB On | 00000000:05:00.0 Off | 0 |
| N/A 34C P0 23W / 100W | 0MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 Tesla V100-FHHL-16GB On | 00000000:82:00.0 Off | 0 |
| N/A 34C P0 25W / 100W | 0MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Additional Comments
Thanks for using LightGBM.
Please, can you provide the types of information that the issue template asked for?
It's difficult to help if we cannot reproduce the issue.
- version of LightGBM
- exact commands you used to install it
- minimal, reproducible example (code we could use to try to reproduce the error)
If you haven't seen it before, please also review https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax. It has some advice for formatting text on GitHub.
Version:
cat VERSION.txt 4.6.0.99
Last commit: commit e7c6c4371b5d725902a09a80b4d6c36e432a4381 (HEAD -> master, origin/master, origin/HEAD) Author: Nick Miller [email protected] Date: Fri Jun 20 20:59:13 2025 -0700
[ci] [R-package] Add period after specified linter names in `nolint` comments (#6950)
On branch master Your branch is up to date with 'origin/master'.
Install commands: cmake:
BUILD_CLI ON
BUILD_CPP_TEST OFF
BUILD_STATIC_LIB OFF
Boost_FILESYSTEM_LIBRARY_RELEA /usr/local/lib/libboost_filesystem.so.1.87.0
Boost_INCLUDE_DIR /usr/local/include
Boost_SYSTEM_LIBRARY_RELEASE /usr/local/lib/libboost_system.so.1.87.0
CMAKE_BUILD_TYPE Release
CMAKE_CUDA_ARCHITECTURES 70
CMAKE_CXX_COMPILER_LAUNCHER ccache
CMAKE_INSTALL_PREFIX /usr/local
ENABLED_SANITIZERS
INSTALL_HEADERS ON
USE_CUDA ON
USE_DEBUG OFF
USE_GPU ON
USE_HOMEBREW_FALLBACK ON
USE_MPI ON
USE_OPENMP ON
USE_SANITIZER OFF
USE_SWIG OFF
USE_TIMETAG OFF
__BUILD_FOR_PYTHON OFF
__BUILD_FOR_R OFF
__INTEGRATE_OPENCL OFF
sudo make install
Minimal reproducible example:
#define LGBM_MAXBIN "255"
constexpr char C_lgbm_dataset_parameters[] = "max_bin=" LGBM_MAXBIN " use_missing=false save_binary=true";
std::string get_gbm_parameters(const uint16_t gpu_id)
{
std::stringstream s;
s << "objective=regression tree_learner=data num_leaves=256 early_stopping_round=200 seed=123 learning_rate=" << PROPS.get_k_learn_rate() << " num_iterations=" << PROPS.get_k_epochs()
<< " metric=l2 force_col_wise=true num_threads=" << C_n_cpu << " device_type=gpu " << C_lgbm_dataset_parameters;
#ifndef NDEBUG
s << " verbosity=1 ";
#endif
return s.str();
}
void main(const int argc, const char **argv)
{
const uint32_t n_samples_2 = 4000000;
const uint32_t n_manifold_features = 160;
lg_errchk(LGBM_SetMaxThreads(C_n_cpu));
DatasetHandle train_dataset;
lg_errchk(LGBM_DatasetCreateFromMat(manifold_features_t.mem, C_API_DTYPE_FLOAT32,n_samples_2, n_manifold_features, 1, // is_row_major = 1 (row-major order)
C_lgbm_dataset_parameters, nullptr, &train_dataset));
lg_errchk(LGBM_DatasetSetField(train_dataset, "label", manifold_labels.mem, n_samples_2, C_API_DTYPE_FLOAT32));
lg_errchk(LGBM_RegisterLogCallback(lgbm_log));
BoosterHandle booster;
common::gpu_context ctx;
const auto gbm_parameters = get_gbm_parameters(ctx.phy_id());
lg_errchk(LGBM_BoosterCreate(train_dataset, gbm_parameters.c_str(), &booster));
int update_finished = 0;
auto iter = PROPS.get_k_epochs() + 1;
assert(iter);
while (update_finished == 0 && --iter) lg_errchk(LGBM_BoosterUpdateOneIter(booster, &update_finished));
int64_t model_size = 0;
LGBM_BoosterSaveModelToString(booster, 0, 0, C_API_FEATURE_IMPORTANCE_SPLIT, 0, &model_size, nullptr);
std::vector<char> model_str(model_size);
lg_errchk(LGBM_BoosterSaveModelToString(booster, 0, 0, C_API_FEATURE_IMPORTANCE_SPLIT, model_size, &model_size, model_str.data()));
lg_errchk(LGBM_BoosterFree(booster));
lg_errchk(LGBM_DatasetFree(train_dataset));
}
Another issue I noticed in the same context as above, is that when I set device_type=gpu and gpu_device_id=X the program always uses the first GPU out of 4 available on the system. I tried it on two different servers with 4 x Nvidia V100 and 4 x A100 - same issue.