LightGBM
LightGBM copied to clipboard
Should use other->numeric_feature_map_[i] in AddFeaturesFrom
Description
In this line: https://github.com/microsoft/LightGBM/blob/master/src/io/dataset.cpp#L1486 I think
int feat_ind = numeric_feature_map_[i];
should be
int feat_ind = other->numeric_feature_map_[i];
The loop condition i < (other->numeric_feature_map_).size()
tells the same story.
Reproducible example
Code
import lightgbm as lgb
import numpy as np
a = np.zeros((100, 1), dtype=np.float32)
b = np.random.normal(size=(100, 5))
dataset_a = lgb.Dataset(a).construct()
dataset_b = lgb.Dataset(b).construct()
dataset_a.add_features_from(dataset_b)
Output
[LightGBM] [Warning] There are no meaningful features, as all feature values are constant.
[LightGBM] [Warning] Find the same feature name (Column_0) in Dataset::AddFeaturesFrom, change its name to (D2_Column_0)
Traceback (most recent call last):
File "add.py", line 10, in <module>
dataset_a.add_features_from(dataset_b)
File "C:\test_lgb_add\lib\site-packages\lightgbm\basic.py", line 2437, in add_features_from
_safe_call(_LIB.LGBM_DatasetAddFeaturesFrom(self.handle, other.handle))
OSError: exception: access violation reading 0x0000000000000000
Environment info
Command(s) you used to install LightGBM
pip install lightgbm==3.3.2
@guolinke @shiyu1994 can you please share your thoughts on this?
thank you, I think it is a bug
Thanks! Do you agree with the proposed change?
yeah, the fix looks good to me!
@EdmondElephant would you like to submit a PR with your proposed fix and your example as a test?
@jmoralez Submitted a PR: https://github.com/microsoft/LightGBM/pull/5434
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.