oneDAL icon indicating copy to clipboard operation
oneDAL copied to clipboard

gbt dtrees always use float instead of the real data type cause unexpected predict result.

Open cmsxbc opened this issue 3 years ago • 5 comments

Describe the bug I'm working on convert lightGBM model to daal gbt model. And the converted model predict different result which is much different to that lightGBM predict.

And I have found that the gbt_dtrees model always use float as the code shows.

https://github.com/oneapi-src/oneDAL/blob/c6c5219c85e5bceb0392e54e653e00e8cc45e21f/cpp/daal/src/algorithms/dtrees/gbt/gbt_predict_dense_default_impl.i#L44

If i changed the typedef to typedef double ModelFPType, the result would be just as expected.

To Reproduce Steps to reproduce the behavior:

  1. use example model from lightgbm lightgbm simple_example, and apply this change for generate a 200 iterations model to make error more easy to be observed.
index 9af83008..debb339c 100644
--- a/examples/python-guide/simple_example.py                                                                          
+++ b/examples/python-guide/simple_example.py
@@ -35,9 +35,10 @@ print('Starting training...')
 # train                                                   
 gbm = lgb.train(params,                                   
                 lgb_train,                                
-                num_boost_round=20,
+                num_boost_round=340,
                 valid_sets=lgb_eval,
-                early_stopping_rounds=5)
+                verbose_eval=False,
+                early_stopping_rounds=100)
  1. convert the model and predict
d4p_model = daal4py.get_gbt_model_from_lightgbm(gbm)
d4p_y_pred = daal4py.gbt_regression_prediction().compute(X_test, d4p_model).prediction.reshape(-1)
print("The rmse of daal4py's prediction is:", mean_squared_error(y_test, d4p_y_pred))

print('are preds of daal4py and lightGBM equal:', (d4p_y_pred == y_pred).all())
print('The rmse of daal4py vs lightGBM is:', mean_squared_error(d4p_y_pred, y_pred) ** 0.5)
  1. the release package will see two pred result is not equal.

  2. apply the patch

index 6b4ffc3af..16aac9b18 100644
--- a/cpp/daal/src/algorithms/dtrees/gbt/gbt_predict_dense_default_impl.i
+++ b/cpp/daal/src/algorithms/dtrees/gbt/gbt_predict_dense_default_impl.i
@@ -41,7 +41,7 @@ namespace prediction
 {
 namespace internal
 {
-typedef float ModelFPType;
+typedef double ModelFPType;
 typedef uint32_t FeatureIndexType;
 const FeatureIndexType VECTOR_BLOCK_SIZE = 64;
  1. all works well.

Expected behavior Use real data type, or the reason why the float is the one~

Output/Screenshots Up one is the unexpected result, and bottom is what I recompiled with typedef double ModelFPType.

2021-05-22-012202_1885x913_scrot

My patch: 2021-05-22-011158_1896x549_scrot

Environment:

  • OS: ArchLInux
  • Compiler: gcc 11.1.0
  • Version: 2021.2.2

cmsxbc avatar May 21 '21 17:05 cmsxbc

@cmsxbc thanks for reporting this bug! Sorry for long response! could you share the actual mean_squared_error difference for float vs double usage? currently we are working on performance affection check.

ShvetsKS avatar May 28 '21 11:05 ShvetsKS

@ShvetsKS Sorry for my poor English.... I'm not sure what's real meaning for "actual mean_squared_error difference for float vs double usage"

For the model , which is from lightGBM example, I mentioned before, the mean_squared_error(daal4py_prediction_float, lightGBM_prediction) is 3.367366706112064e-06 using float. And when change to double, mean_squared_error(daal4py_prediction_double, lightGBM_prediction) is just 0.

And mean_squared_error(daal4py_prediction_float, daal4py_prediction_double) is also 3.367366706112064e-06 which is the result of mean_squared_error(daal4py_prediction_float, lightGBM_prediction)

Actually, I found this error from a model built with thousands of features and thousands of iterations. But I'm afraid I can't provide actual value because of "Security policy for data" of my company.

What I can provide is that sqrt(mean_squared_error(d4p_prediction_float, lightGBM_prediction)) / lightGBM_prediction is approximately equal to 0.2 , and d4p_prediction_double is absolutely equal to lightGBM_prediction. And we can tolerate sqrt(mean_squared_error(d4p_prediction, lightGBM_prediction)) / lightGBM_prediction < 1e-9

cmsxbc avatar May 28 '21 12:05 cmsxbc

@ShvetsKS Sorry for bothering, is there any progress? T_T

cmsxbc avatar Jul 15 '21 06:07 cmsxbc

@cmsxbc currently we are working on appropriate solution to save support of "old" trained models with float fields. Sorry for slow fix delivery.

ShvetsKS avatar Jul 15 '21 09:07 ShvetsKS

@ShvetsKS Sorry again, but it have took almost another season.... T_T

cmsxbc avatar Oct 06 '21 15:10 cmsxbc