oneDAL gbt dtrees always use float instead of the real data type cause unexpected predict result.

Describe the bug I'm working on convert lightGBM model to daal gbt model. And the converted model predict different result which is much different to that lightGBM predict.

And I have found that the gbt_dtrees model always use float as the code shows.

https://github.com/oneapi-src/oneDAL/blob/c6c5219c85e5bceb0392e54e653e00e8cc45e21f/cpp/daal/src/algorithms/dtrees/gbt/gbt_predict_dense_default_impl.i#L44

If i changed the typedef to typedef double ModelFPType, the result would be just as expected.

To Reproduce Steps to reproduce the behavior:

use example model from lightgbm lightgbm simple_example, and apply this change for generate a 200 iterations model to make error more easy to be observed.

index 9af83008..debb339c 100644
--- a/examples/python-guide/simple_example.py                                                                          
+++ b/examples/python-guide/simple_example.py
@@ -35,9 +35,10 @@ print('Starting training...')
 # train                                                   
 gbm = lgb.train(params,                                   
                 lgb_train,                                
-                num_boost_round=20,
+                num_boost_round=340,
                 valid_sets=lgb_eval,
-                early_stopping_rounds=5)
+                verbose_eval=False,
+                early_stopping_rounds=100)

convert the model and predict

d4p_model = daal4py.get_gbt_model_from_lightgbm(gbm)
d4p_y_pred = daal4py.gbt_regression_prediction().compute(X_test, d4p_model).prediction.reshape(-1)
print("The rmse of daal4py's prediction is:", mean_squared_error(y_test, d4p_y_pred))

print('are preds of daal4py and lightGBM equal:', (d4p_y_pred == y_pred).all())
print('The rmse of daal4py vs lightGBM is:', mean_squared_error(d4p_y_pred, y_pred) ** 0.5)

the release package will see two pred result is not equal.
apply the patch

index 6b4ffc3af..16aac9b18 100644
--- a/cpp/daal/src/algorithms/dtrees/gbt/gbt_predict_dense_default_impl.i
+++ b/cpp/daal/src/algorithms/dtrees/gbt/gbt_predict_dense_default_impl.i
@@ -41,7 +41,7 @@ namespace prediction
 {
 namespace internal
 {
-typedef float ModelFPType;
+typedef double ModelFPType;
 typedef uint32_t FeatureIndexType;
 const FeatureIndexType VECTOR_BLOCK_SIZE = 64;

all works well.

Expected behavior Use real data type, or the reason why the float is the one~

Output/Screenshots Up one is the unexpected result, and bottom is what I recompiled with typedef double ModelFPType.

2021-05-22-012202_1885x913_scrot

My patch: 2021-05-22-011158_1896x549_scrot

Environment:

OS: ArchLInux
Compiler: gcc 11.1.0
Version: 2021.2.2

May 21 '21 17:05 cmsxbc

@cmsxbc thanks for reporting this bug! Sorry for long response! could you share the actual mean_squared_error difference for float vs double usage? currently we are working on performance affection check.

May 28 '21 11:05 ShvetsKS

@ShvetsKS Sorry for my poor English.... I'm not sure what's real meaning for "actual mean_squared_error difference for float vs double usage"

For the model , which is from lightGBM example, I mentioned before, the mean_squared_error(daal4py_prediction_float, lightGBM_prediction) is 3.367366706112064e-06 using float. And when change to double, mean_squared_error(daal4py_prediction_double, lightGBM_prediction) is just 0.

And mean_squared_error(daal4py_prediction_float, daal4py_prediction_double) is also 3.367366706112064e-06 which is the result of mean_squared_error(daal4py_prediction_float, lightGBM_prediction)

Actually, I found this error from a model built with thousands of features and thousands of iterations. But I'm afraid I can't provide actual value because of "Security policy for data" of my company.

What I can provide is that sqrt(mean_squared_error(d4p_prediction_float, lightGBM_prediction)) / lightGBM_prediction is approximately equal to 0.2 , and d4p_prediction_double is absolutely equal to lightGBM_prediction. And we can tolerate sqrt(mean_squared_error(d4p_prediction, lightGBM_prediction)) / lightGBM_prediction < 1e-9

May 28 '21 12:05 cmsxbc

@ShvetsKS Sorry for bothering, is there any progress? T_T

Jul 15 '21 06:07 cmsxbc

@cmsxbc currently we are working on appropriate solution to save support of "old" trained models with float fields. Sorry for slow fix delivery.

Jul 15 '21 09:07 ShvetsKS

@ShvetsKS Sorry again, but it have took almost another season.... T_T

Oct 06 '21 15:10 cmsxbc

oneDAL oneDAL copied to clipboard

gbt dtrees always use float instead of the real data type cause unexpected predict result.

oneDAL
oneDAL copied to clipboard