LightGBM
LightGBM copied to clipboard
segfault during predict
import pickle
model, X, kwargs = pickle.load(open("lgbsegfault.pkl", "rb"))
model.predict(X, **kwargs)
data can be anything for predict, 1 row or whatever fails. Only special thing is extra_trees=True
Backtrace:
/home/jon/minicondadai/lib/python3.6/site-packages/lightgbm_gpu/lib_lightgbm.so(_ZNK8LightGBM4GBDT10PredictRawEPKdPdPKNS_27PredictionEarlyStopInstanceE+0x730)[0x7f4d47691880]
/home/jon/minicondadai/lib/python3.6/site-packages/lightgbm_gpu/lib_lightgbm.so(_ZNK8LightGBM4GBDT7PredictEPKdPdPKNS_27PredictionEarlyStopInstanceE+0x15)[0x7f4d47692585]
/home/jon/minicondadai/lib/python3.6/site-packages/lightgbm_gpu/lib_lightgbm.so(_ZNSt17_Function_handlerIFvRKSt6vectorISt4pairIidESaIS2_EEPdEZN8LightGBM9PredictorC4EPNS9_8BoostingEiibbbbidEUlS6_S7_E3_E9_M_invokeERKSt9_Any_dataS6_OS7_+0x1fa)[0x7f4d4798ce6a]
/home/jon/minicondadai/lib/python3.6/site-packages/lightgbm_gpu/lib_lightgbm.so(+0x40c05c)[0x7f4d4798405c]
/home/jon/minicondadai/lib/python3.6/site-packages/numpy/core/../../../.././libgomp.so.1(GOMP_parallel+0x42)[0x7f4e3d716e8c]
/home/jon/minicondadai/lib/python3.6/site-packages/lightgbm_gpu/lib_lightgbm.so(_ZNK8LightGBM7Booster7PredictEiiiiiSt8functionIFSt6vectorISt4pairIidESaIS4_EEiEERKNS_6ConfigEPdPl+0x205)[0x7f4d47990ce5]
/home/jon/minicondadai/lib/python3.6/site-packages/lightgbm_gpu/lib_lightgbm.so(LGBM_BoosterPredictForMat+0xd1)[0x7f4d479800c1]
/home/jon/minicondadai/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c)[0x7f4e40b02630]
/home/jon/minicondadai/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0x22d)[0x7f4e40b01fed]
/home/jon/minicondadai/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce)[0x7f4e3f438f9e]
/home/jon/minicondadai/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(+0x139d5)[0x7f4e3f4399d5]
Is it possible to provide a self-contained reproducible example? I tried to reproduce the problem from your description as follows, but this runs without error.
import numpy as np
import lightgbm as lgb
import pickle
x = np.random.random([10, 2])
y = np.random.choice([0, 1], 10)
lgb_data = lgb.Dataset(x, label=y)
est = lgb.train({'objective': 'binary', 'extra_trees': True}, lgb_data, num_boost_round=5)
with open('pickled_model', 'wb') as f:
pickle.dump([est, x, {'num_iteration': 3}], f, pickle.HIGHEST_PROTOCOL)
with open('pickled_model', 'rb') as f:
model, X, kw_args = pickle.load(f)
model.predict(X, **kw_args)
Sorry, did you try the script and pickle I provided? That is quite contained.
I meant could you provide a script that includes the model training and pickle steps (ideally using random data or a built-in dataset from sklearn etc). That would help isolate the source of the problem.
As a workaround, you could try using LightGBM's save_model
and load_model
(https://lightgbm.readthedocs.io/en/latest/Python-Intro.html#training) rather than pickle.
The predict that failed is at end of non-trivial steps. It has nothing to do with pickle itself. It's just that lightgbm segfaults with this particular model.
I encountered a similar issue. I am deploying a lgb model within the Tornado framework.
The model loaded from pickle file, in the init() function works normally, but when I call the model (used as an instance variable) in an instance method, the predict() or predict_proba() function cause a segmentation fault.
I used FaultHandler to trace the exact line that caused the segfault. The result is:
python3.6/site-packages/lightgbm/basic.py", line 656 in inner_predict
which is:
preds.ctypes.data_as(ctypes.pointer(ctypes.c_double))))
Is there a way to solve or work around this issue? Please advise.
@peixin-lin if you could provide some additional information, we'd be happy to investigate:
- version of
lightgbm
you're using - text-format version of the model (can be obtained with
Booster.save_model()
) - input data that causes this
.predict()
to segfault - exact way you're calling
predict()
(e.g., are you usingpredict()
orpredict_proba()
? are you passing additional parameters?)
@peixin-lin if you could provide some additional information, we'd be happy to investigate:
- version of
lightgbm
you're using- text-format version of the model (can be obtained with
Booster.save_model()
)- input data that causes this
.predict()
to segfault- exact way you're calling
predict()
(e.g., are you usingpredict()
orpredict_proba()
? are you passing additional parameters?)
Thanks for the reply. The details are as below:
- I encountered the segfault when using v3.3.2 at first so I downgraded it to v2.3.1 but the problem still exists.
- I tried the pickle format model saved by using the sklearn API (
LGBMClassifier
) and the .txt format model generated byBooster.save_model()
. They are all generated and called by the same version oflightgbm
, but the model format makes no difference. - I tried different input shapes, values, dtypes and data structures (list, ndarray and matrix) and got the same segfault.
- Both
predict()
andpredict_proba()
are tried, no additional parameters.
Thanks for that information!
I might not have been clear...I'm asking if you can actually provide here the text-format model file and a sample input data that causes prediction to segfault.
That way, we could try experimenting with a heavily-instrumented version of LightGBM to try to find the source of the segfault.
@peixin-lin can you provide a reproduce example?
This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!
This shouldn't be closed, I gave a MRE for the fitted state, should be good enough.
@shiyu1994 can the new CUDA version run this example successfully?
@guolinke I run the example provided by @pseudotensor successfully with both gpu
and cuda_exp
versions. Both versions provide the same output:
[LightGBM] [Warning] Unknown parameter: silent
[LightGBM] [Warning] Unknown parameter: predict_batching
[LightGBM] [Warning] seed is set=20070863, random_state=42 will be ignored. Current value: seed=20070863
[LightGBM] [Warning] num_threads is set=8, n_jobs=8 will be ignored. Current value: num_threads=8
[402720. 402720. 402720. ... 402720. 402720. 402720.]
@pseudotensor Could you please take a look at this output to see if it is the identical with your previous trials?
Closing this due to lack of response from @shiyu1994 's post in May 2022: https://github.com/microsoft/LightGBM/issues/4156#issuecomment-1120691622
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.