oneDAL
oneDAL copied to clipboard
gbt dtrees always use float instead of the real data type cause unexpected predict result.
Describe the bug I'm working on convert lightGBM model to daal gbt model. And the converted model predict different result which is much different to that lightGBM predict.
And I have found that the gbt_dtrees model always use float as the code shows.
https://github.com/oneapi-src/oneDAL/blob/c6c5219c85e5bceb0392e54e653e00e8cc45e21f/cpp/daal/src/algorithms/dtrees/gbt/gbt_predict_dense_default_impl.i#L44
If i changed the typedef to typedef double ModelFPType
, the result would be just as expected.
To Reproduce Steps to reproduce the behavior:
- use example model from lightgbm lightgbm simple_example, and apply this change for generate a 200 iterations model to make error more easy to be observed.
index 9af83008..debb339c 100644
--- a/examples/python-guide/simple_example.py
+++ b/examples/python-guide/simple_example.py
@@ -35,9 +35,10 @@ print('Starting training...')
# train
gbm = lgb.train(params,
lgb_train,
- num_boost_round=20,
+ num_boost_round=340,
valid_sets=lgb_eval,
- early_stopping_rounds=5)
+ verbose_eval=False,
+ early_stopping_rounds=100)
- convert the model and predict
d4p_model = daal4py.get_gbt_model_from_lightgbm(gbm)
d4p_y_pred = daal4py.gbt_regression_prediction().compute(X_test, d4p_model).prediction.reshape(-1)
print("The rmse of daal4py's prediction is:", mean_squared_error(y_test, d4p_y_pred))
print('are preds of daal4py and lightGBM equal:', (d4p_y_pred == y_pred).all())
print('The rmse of daal4py vs lightGBM is:', mean_squared_error(d4p_y_pred, y_pred) ** 0.5)
-
the release package will see two pred result is not equal.
-
apply the patch
index 6b4ffc3af..16aac9b18 100644
--- a/cpp/daal/src/algorithms/dtrees/gbt/gbt_predict_dense_default_impl.i
+++ b/cpp/daal/src/algorithms/dtrees/gbt/gbt_predict_dense_default_impl.i
@@ -41,7 +41,7 @@ namespace prediction
{
namespace internal
{
-typedef float ModelFPType;
+typedef double ModelFPType;
typedef uint32_t FeatureIndexType;
const FeatureIndexType VECTOR_BLOCK_SIZE = 64;
- all works well.
Expected behavior Use real data type, or the reason why the float is the one~
Output/Screenshots
Up one is the unexpected result, and bottom is what I recompiled with typedef double ModelFPType
.
My patch:
Environment:
- OS: ArchLInux
- Compiler: gcc 11.1.0
- Version: 2021.2.2
@cmsxbc thanks for reporting this bug! Sorry for long response!
could you share the actual mean_squared_error
difference for float
vs double
usage?
currently we are working on performance affection check.
@ShvetsKS Sorry for my poor English.... I'm not sure what's real meaning for "actual mean_squared_error
difference for float
vs double
usage"
For the model , which is from lightGBM example, I mentioned before, the mean_squared_error(daal4py_prediction_float, lightGBM_prediction)
is 3.367366706112064e-06
using float
. And when change to double
, mean_squared_error(daal4py_prediction_double, lightGBM_prediction)
is just 0
.
And mean_squared_error(daal4py_prediction_float, daal4py_prediction_double)
is also 3.367366706112064e-06
which is the result of mean_squared_error(daal4py_prediction_float, lightGBM_prediction)
Actually, I found this error from a model built with thousands of features and thousands of iterations. But I'm afraid I can't provide actual value because of "Security policy for data" of my company.
What I can provide is that sqrt(mean_squared_error(d4p_prediction_float, lightGBM_prediction)) / lightGBM_prediction
is approximately equal to 0.2
, and d4p_prediction_double
is absolutely equal to lightGBM_prediction
.
And we can tolerate sqrt(mean_squared_error(d4p_prediction, lightGBM_prediction)) / lightGBM_prediction < 1e-9
@ShvetsKS Sorry for bothering, is there any progress? T_T
@cmsxbc currently we are working on appropriate solution to save support of "old" trained models with float fields. Sorry for slow fix delivery.
@ShvetsKS Sorry again, but it have took almost another season.... T_T