LightGBM
LightGBM copied to clipboard
The training results of CUDA and CPU differ with the same dataset and parameters using LightGBM 4.1.0
Description
The training results of CUDA and CPU differ with the same dataset and parameters
Reproducible example
import numpy as np
import lightgbm as lgb
N, k = int(1e7), int(1e1)
np.random.seed(0)
X = np.random.normal(0, 1, (N, k))
beta = np.random.normal(0, 1, k)
epsilon = np.random.normal(0, 10, N)
Y = X.dot(beta) + epsilon
W = np.abs(np.random.normal(0, 1, N))
train_set = lgb.Dataset(X, label = Y, weight = W)
params = {
"objective": 'regression',
"max_bin": 63,
"num_leaves": 63,
"learning_rate": 0.1,
"force_row_wise": True,
'verbose': 1,
"deterministic": True,
}
params_gpu = params.copy()
params_gpu.update({'device_type': 'cuda'})
params_cpu = params.copy()
model_gpu = lgb.train(params_gpu, train_set, num_boost_round = 100)
model_cpu = lgb.train(params_cpu, train_set, num_boost_round = 100)
y_pred_gpu = model_gpu.predict(X)
y_pred_cpu = model_cpu.predict(X)
print(y_pred_gpu)
print(y_pred_cpu)
y_dif = np.abs(y_pred_gpu - y_pred_cpu)
print(np.max(y_dif), np.mean(y_dif))
the output is as follows
[LightGBM] [Warning] Although "deterministic" is set, the results ran by GPU may be non-deterministic.
[LightGBM] [Warning] Using sparse features with CUDA is currently not supported.
[LightGBM] [Warning] Although "deterministic" is set, the results ran by GPU may be non-deterministic.
[LightGBM] [Info] Total Bins 630
[LightGBM] [Info] Number of data points in the train set: 10000000, number of used features: 10
[LightGBM] [Info] Start training from score -0.001554
[LightGBM] [Warning] Although "deterministic" is set, the results ran by GPU may be non-deterministic.
[LightGBM] [Info] Total Bins 630
[LightGBM] [Info] Number of data points in the train set: 10000000, number of used features: 10
[LightGBM] [Info] Start training from score -0.001554
[ 0.29451256 -1.27554319 -4.77001593 ... 0.43338232 -3.89348605
-0.54135903]
[ 0.32018665 -1.26717548 -4.67964188 ... 0.4488524 -3.88445162
-0.5272684 ]
2.458828499401017 0.030497075739206972
Environment info
LightGBM version or commit hash: v4.1.0
Command(s) you used to install LightGBM
cmake -DUSE_CUDA=1 -DUSE_CPU=1 ..
The GPU device is A100
Additional Comments
@shiyu1994 can you please answer this one?
any response?
Thanks for providing the example.
The CUDA and CPU versions may have minor differences in implementation. But in general these differences do not result in a big difference in performance. We will take more effort to make these two versions as consistent as possible.
Do you observe the difference between the performance metrics of these two versions?
Performance: We were expecting that CUDA can make some improvement in performance, but the results showed that in some cases CUDA version was slower. The reason might be that our problem size is too small for A100.
Deterministic property: Is there any way I can get exactly same model from CPU version and GPU version? @shiyu1994
@w158rk What's the sample size and feature number of your datasets?
For now, the implementation of CUDA version still has some minor differences when compared with the CPU version. You may try with the older GPU version with device_type=gpu
to see if it produces consistent results with CPU.
Just like the example: N, k = int(1e7), int(1e1)
OK, I'll try. Do you have any plans about ensuring the consistence of CUDA version and CPU version? Can I expect this feature to be implemented in near future?
@w158rk Sure, we will ensure the consistency recently. Perhaps together with the next one or two releases.
However, I still feel it is unreasonable that CUDA version should be slower than CPU in your example. I'll try your examples.
That's great! BTW, This example is just for illustrating the problem. The actual data size and model structure is different from this example. We just want to make sure that it is practical to use the CUDA version. We'll select the right version based on the performance of the actual applications.
Sorry, I was to mean unreasonable. I just corrected. I'll profile the CUDA and CPU versions with your scripts.
Excluding the dataset construction time, I run the following code with 1 A100 GPU. The cuda
version is about 6 times faster than cpu
version.
import numpy as np
import lightgbm as lgb
from time import time
N, k = int(1e7), int(1e1)
np.random.seed(0)
X = np.random.normal(0, 1, (N, k))
beta = np.random.normal(0, 1, k)
epsilon = np.random.normal(0, 10, N)
Y = X.dot(beta) + epsilon
W = np.abs(np.random.normal(0, 1, N))
train_set = lgb.Dataset(X, label = Y, weight = W, params={"max_bin": 63, "device_type": "cuda"})
train_set.construct()
params = {
"objective": 'regression',
"num_leaves": 63,
"learning_rate": 0.1,
"force_row_wise": True,
'verbose': 2,
"deterministic": True,
"num_threads": 16
}
params_gpu = params.copy()
params_gpu.update({'device_type': 'cuda'})
params_cpu = params.copy()
gpu_start = time()
model_gpu = lgb.train(params_gpu, train_set, num_boost_round = 100)
print("finished gpu in %f" % (time() - gpu_start))
cpu_start = time()
model_cpu = lgb.train(params_cpu, train_set, num_boost_round = 100)
print("finished cpu in %f" % (time() - cpu_start))
y_pred_gpu = model_gpu.predict(X)
y_pred_cpu = model_cpu.predict(X)
print(y_pred_gpu)
print(y_pred_cpu)
y_dif = np.abs(y_pred_gpu - y_pred_cpu)
print(np.max(y_dif), np.mean(y_dif))
[LightGBM] [Warning] Using sparse features with CUDA is currently not supported.
[LightGBM] [Warning] Although "deterministic" is set, the results ran by GPU may be non-deterministic.
[LightGBM] [Warning] Although "deterministic" is set, the results ran by GPU may be non-deterministic.
[LightGBM] [Debug] Dataset::GetMultiBinFromAllFeatures: sparse rate 0.000000
[LightGBM] [Info] Total Bins 630
[LightGBM] [Info] Number of data points in the train set: 10000000, number of used features: 10
[LightGBM] [Debug] Adding init score = -0.001554
[LightGBM] [Info] Start training from score -0.001554
finished gpu in 2.202397
[LightGBM] [Debug] Dataset::GetMultiBinFromAllFeatures: sparse rate 0.000000
[LightGBM] [Info] Total Bins 630
[LightGBM] [Info] Number of data points in the train set: 10000000, number of used features: 10
[LightGBM] [Info] Start training from score -0.001554
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 7
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 7
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 7
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 7
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 7
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 8
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 7
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 8
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 8
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 9
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 9
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 9
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 9
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 9
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 9
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 9
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 10
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 10
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 10
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 10
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 11
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 14
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 11
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 13
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 12
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 12
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 15
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 14
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 13
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 13
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 14
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 13
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 15
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 12
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 12
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 14
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 13
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 12
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 15
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 11
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 13
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 12
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 13
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 11
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 14
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 11
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 13
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 15
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 13
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 13
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 13
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 14
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 11
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 13
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 15
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 11
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 13
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 14
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 10
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 15
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 17
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 11
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 12
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 13
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 11
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 14
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 17
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 12
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 18
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 12
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 15
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 16
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 14
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 13
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 12
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 13
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 14
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 13
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 16
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 18
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 13
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 17
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 15
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 12
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 20
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 12
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 21
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 19
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 12
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 14
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 20
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 15
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 15
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 17
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 17
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 14
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 15
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 17
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 14
[LightGBM] [Debug] Trained a tree with leaves = 63 and depth = 17
finished cpu in 12.829864
[ 0.29451256 -1.27554319 -4.77001595 ... 0.43338231 -3.89348603
-0.54135903]
[ 0.32018665 -1.26717548 -4.67964188 ... 0.4488524 -3.88445162
-0.5272684 ]
2.4588284718378937 0.030497075864870184
It's true, this example runs faster with CUDA version on my server as well. I'll let you know if I can find an example with unexpected performance results. Thx!
Thanks. That would be very helpful!