LightGBM icon indicating copy to clipboard operation
LightGBM copied to clipboard

segfault during predict

Open pseudotensor opened this issue 3 years ago • 13 comments

import pickle
model, X, kwargs = pickle.load(open("lgbsegfault.pkl", "rb"))
model.predict(X, **kwargs)

data can be anything for predict, 1 row or whatever fails. Only special thing is extra_trees=True

Backtrace:
/home/jon/minicondadai/lib/python3.6/site-packages/lightgbm_gpu/lib_lightgbm.so(_ZNK8LightGBM4GBDT10PredictRawEPKdPdPKNS_27PredictionEarlyStopInstanceE+0x730)[0x7f4d47691880]
/home/jon/minicondadai/lib/python3.6/site-packages/lightgbm_gpu/lib_lightgbm.so(_ZNK8LightGBM4GBDT7PredictEPKdPdPKNS_27PredictionEarlyStopInstanceE+0x15)[0x7f4d47692585]
/home/jon/minicondadai/lib/python3.6/site-packages/lightgbm_gpu/lib_lightgbm.so(_ZNSt17_Function_handlerIFvRKSt6vectorISt4pairIidESaIS2_EEPdEZN8LightGBM9PredictorC4EPNS9_8BoostingEiibbbbidEUlS6_S7_E3_E9_M_invokeERKSt9_Any_dataS6_OS7_+0x1fa)[0x7f4d4798ce6a]
/home/jon/minicondadai/lib/python3.6/site-packages/lightgbm_gpu/lib_lightgbm.so(+0x40c05c)[0x7f4d4798405c]
/home/jon/minicondadai/lib/python3.6/site-packages/numpy/core/../../../.././libgomp.so.1(GOMP_parallel+0x42)[0x7f4e3d716e8c]
/home/jon/minicondadai/lib/python3.6/site-packages/lightgbm_gpu/lib_lightgbm.so(_ZNK8LightGBM7Booster7PredictEiiiiiSt8functionIFSt6vectorISt4pairIidESaIS4_EEiEERKNS_6ConfigEPdPl+0x205)[0x7f4d47990ce5]
/home/jon/minicondadai/lib/python3.6/site-packages/lightgbm_gpu/lib_lightgbm.so(LGBM_BoosterPredictForMat+0xd1)[0x7f4d479800c1]
/home/jon/minicondadai/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c)[0x7f4e40b02630]
/home/jon/minicondadai/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0x22d)[0x7f4e40b01fed]
/home/jon/minicondadai/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce)[0x7f4e3f438f9e]
/home/jon/minicondadai/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(+0x139d5)[0x7f4e3f4399d5]

lgbsegfault.pkl.zip

pseudotensor avatar Apr 02 '21 20:04 pseudotensor

Is it possible to provide a self-contained reproducible example? I tried to reproduce the problem from your description as follows, but this runs without error.

import numpy as np
import lightgbm as lgb
import pickle

x = np.random.random([10, 2])
y = np.random.choice([0, 1], 10)

lgb_data = lgb.Dataset(x, label=y)
est = lgb.train({'objective': 'binary', 'extra_trees': True}, lgb_data, num_boost_round=5)

with open('pickled_model', 'wb') as f:
    pickle.dump([est, x, {'num_iteration': 3}], f, pickle.HIGHEST_PROTOCOL)

with open('pickled_model', 'rb') as f:
    model, X, kw_args = pickle.load(f)

model.predict(X, **kw_args)

btrotta avatar Apr 08 '21 10:04 btrotta

Sorry, did you try the script and pickle I provided? That is quite contained.

pseudotensor avatar Apr 08 '21 10:04 pseudotensor

I meant could you provide a script that includes the model training and pickle steps (ideally using random data or a built-in dataset from sklearn etc). That would help isolate the source of the problem.

As a workaround, you could try using LightGBM's save_model and load_model (https://lightgbm.readthedocs.io/en/latest/Python-Intro.html#training) rather than pickle.

btrotta avatar Apr 09 '21 09:04 btrotta

The predict that failed is at end of non-trivial steps. It has nothing to do with pickle itself. It's just that lightgbm segfaults with this particular model.

pseudotensor avatar Apr 09 '21 16:04 pseudotensor

I encountered a similar issue. I am deploying a lgb model within the Tornado framework. The model loaded from pickle file, in the init() function works normally, but when I call the model (used as an instance variable) in an instance method, the predict() or predict_proba() function cause a segmentation fault. I used FaultHandler to trace the exact line that caused the segfault. The result is: python3.6/site-packages/lightgbm/basic.py", line 656 in inner_predict which is: preds.ctypes.data_as(ctypes.pointer(ctypes.c_double))))

Is there a way to solve or work around this issue? Please advise.

peixin-lin avatar Jan 13 '22 10:01 peixin-lin

@peixin-lin if you could provide some additional information, we'd be happy to investigate:

  • version of lightgbm you're using
  • text-format version of the model (can be obtained with Booster.save_model())
  • input data that causes this .predict() to segfault
  • exact way you're calling predict() (e.g., are you using predict() or predict_proba()? are you passing additional parameters?)

jameslamb avatar Jan 15 '22 04:01 jameslamb

@peixin-lin if you could provide some additional information, we'd be happy to investigate:

  • version of lightgbm you're using
  • text-format version of the model (can be obtained with Booster.save_model())
  • input data that causes this .predict() to segfault
  • exact way you're calling predict() (e.g., are you using predict() or predict_proba()? are you passing additional parameters?)

Thanks for the reply. The details are as below:

  • I encountered the segfault when using v3.3.2 at first so I downgraded it to v2.3.1 but the problem still exists.
  • I tried the pickle format model saved by using the sklearn API (LGBMClassifier) and the .txt format model generated by Booster.save_model(). They are all generated and called by the same version of lightgbm, but the model format makes no difference.
  • I tried different input shapes, values, dtypes and data structures (list, ndarray and matrix) and got the same segfault.
  • Both predict() and predict_proba() are tried, no additional parameters.

peixin-lin avatar Jan 17 '22 03:01 peixin-lin

Thanks for that information!

I might not have been clear...I'm asking if you can actually provide here the text-format model file and a sample input data that causes prediction to segfault.

That way, we could try experimenting with a heavily-instrumented version of LightGBM to try to find the source of the segfault.

jameslamb avatar Jan 17 '22 03:01 jameslamb

@peixin-lin can you provide a reproduce example?

guolinke avatar Mar 02 '22 12:03 guolinke

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!

github-actions[bot] avatar Apr 12 '22 04:04 github-actions[bot]

This shouldn't be closed, I gave a MRE for the fitted state, should be good enough.

pseudotensor avatar Apr 12 '22 04:04 pseudotensor

@shiyu1994 can the new CUDA version run this example successfully?

guolinke avatar Apr 12 '22 05:04 guolinke

@guolinke I run the example provided by @pseudotensor successfully with both gpu and cuda_exp versions. Both versions provide the same output:

[LightGBM] [Warning] Unknown parameter: silent
[LightGBM] [Warning] Unknown parameter: predict_batching
[LightGBM] [Warning] seed is set=20070863, random_state=42 will be ignored. Current value: seed=20070863
[LightGBM] [Warning] num_threads is set=8, n_jobs=8 will be ignored. Current value: num_threads=8
[402720. 402720. 402720. ... 402720. 402720. 402720.]

@pseudotensor Could you please take a look at this output to see if it is the identical with your previous trials?

shiyu1994 avatar May 09 '22 06:05 shiyu1994

Closing this due to lack of response from @shiyu1994 's post in May 2022: https://github.com/microsoft/LightGBM/issues/4156#issuecomment-1120691622

jameslamb avatar Sep 05 '23 05:09 jameslamb

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

github-actions[bot] avatar Dec 06 '23 00:12 github-actions[bot]