XGBoostError: Boolean is not supported
Hi all, I am the running the rapids NYCTaxi notebook via docker image rapidsai/rapidsai:0.10-cuda10.1-runtime-ubuntu18.04, but I am getting the below error at the training step, some tip to fix it?:
import dask_xgboost as dxgb_gpu
params = {
'learning_rate': 0.3,
'max_depth': 8,
'objective': 'reg:squarederror',
'subsample': 0.6,
'gamma': 1,
'silent': True,
'verbose_eval': True,
'tree_method':'gpu_hist',
'n_gpus': 1
}
trained_model = dxgb_gpu.train(client, params, X_train, Y_train, num_boost_round=100)
Tracelog:
XGBoostError Traceback (most recent call last)
<timed exec> in <module>
/opt/conda/envs/rapids/lib/python3.6/site-packages/dask_xgboost/core.py in train(client, params, data, labels, dmatrix_kwargs, **kwargs)
233 """
234 return client.sync(_train, client, params, data,
--> 235 labels, dmatrix_kwargs, **kwargs)
236
237
/opt/conda/envs/rapids/lib/python3.6/site-packages/distributed/client.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
760 else:
761 return sync(
--> 762 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
763 )
764
/opt/conda/envs/rapids/lib/python3.6/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
331 if error[0]:
332 typ, exc, tb = error[0]
--> 333 raise exc.with_traceback(tb)
334 else:
335 return result[0]
/opt/conda/envs/rapids/lib/python3.6/site-packages/distributed/utils.py in f()
315 if callback_timeout is not None:
316 future = gen.with_timeout(timedelta(seconds=callback_timeout), future)
--> 317 result[0] = yield future
318 except Exception as exc:
319 error[0] = sys.exc_info()
/opt/conda/envs/rapids/lib/python3.6/site-packages/tornado/gen.py in run(self)
733
734 try:
--> 735 value = future.result()
736 except Exception:
737 exc_info = sys.exc_info()
/opt/conda/envs/rapids/lib/python3.6/site-packages/tornado/gen.py in run(self)
740 if exc_info is not None:
741 try:
--> 742 yielded = self.gen.throw(*exc_info) # type: ignore
743 finally:
744 # Break up a reference to itself
/opt/conda/envs/rapids/lib/python3.6/site-packages/dask_xgboost/core.py in _train(client, params, data, labels, dmatrix_kwargs, **kwargs)
193
194 # Get the results, only one will be non-None
--> 195 results = yield client._gather(futures)
196 result = [v for v in results if v]
197 if not params.get('dask_all_models', False):
/opt/conda/envs/rapids/lib/python3.6/site-packages/tornado/gen.py in run(self)
733
734 try:
--> 735 value = future.result()
736 except Exception:
737 exc_info = sys.exc_info()
/opt/conda/envs/rapids/lib/python3.6/site-packages/distributed/client.py in _gather(self, futures, errors, direct, local_worker)
1699 exc = CancelledError(key)
1700 else:
-> 1701 raise exception.with_traceback(traceback)
1702 raise exc
1703 if errors == "skip":
/opt/conda/envs/rapids/lib/python3.6/site-packages/dask_xgboost/core.py in train_part()
97 if dmatrix_kwargs is None:
98 dmatrix_kwargs = {}
---> 99 dtrain = xgb.DMatrix(data, labels, **dmatrix_kwargs)
100
101 elif labels[0] is None and isinstance(data[0], xgb.DMatrix):
/opt/conda/envs/rapids/lib/python3.6/site-packages/xgboost/core.py in __init__()
512 self._init_from_dt(data, nthread)
513 elif _use_columnar_initializer(data):
--> 514 self._init_from_columnar(data, missing)
515 else:
516 try:
/opt/conda/envs/rapids/lib/python3.6/site-packages/xgboost/core.py in _init_from_columnar()
651 _LIB.XGDMatrixCreateFromArrayInterfaces(
652 interfaces, ctypes.c_int32(has_missing),
--> 653 ctypes.c_float(missing), ctypes.byref(handle)))
654 self.handle = handle
655
/opt/conda/envs/rapids/lib/python3.6/site-packages/xgboost/core.py in _check_call()
199 """
200 if ret != 0:
--> 201 raise XGBoostError(py_str(_LIB.XGBGetLastError()))
202
203
XGBoostError: [16:36:13] /conda/conda-bld/xgboost_1571337679414/work/src/data/simple_csr_source.cu:161: Boolean is not supported.
Stack trace:
[bt] (0) /opt/conda/envs/rapids/lib/libxgboost.so(+0xc9594) [0x7f80d2a83594]
[bt] (1) /opt/conda/envs/rapids/lib/libxgboost.so(xgboost::data::SimpleCSRSource::FromDeviceColumnar(std::vector<xgboost::Json, std::allocator<xgboost::Json> > const&, bool, float)+0x743) [0x7f80d2c66443]
[bt] (2) /opt/conda/envs/rapids/lib/libxgboost.so(xgboost::data::SimpleCSRSource::CopyFrom(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, float)+0xc74) [0x7f80d2ade9e4]
[bt] (3) /opt/conda/envs/rapids/lib/libxgboost.so(XGDMatrixCreateFromArrayInterfaces+0x1c8) [0x7f80d2a91b08]
[bt] (4) /opt/conda/envs/rapids/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7f82df0f3630]
[bt] (5) /opt/conda/envs/rapids/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0x22d) [0x7f82df0f2fed]
[bt] (6) /opt/conda/envs/rapids/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce) [0x7f82df10a00e]
[bt] (7) /opt/conda/envs/rapids/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(+0x13a45) [0x7f82df10aa45]
[bt] (8) /opt/conda/envs/rapids/bin/python(_PyObject_FastCallDict+0x8b) [0x5603fddf67bb]`
Does this work using just XGBoost on a subset of the data? That error seems to indicate that something in xgboost can't handle a bool column?
Hi @TomAugspurger , thank you are right, I have dropped the boolean column (which wasn't needed) and it worked. So, what is the issue with dask-xgboost handling boolean columns, where should I report this bug?
Does it work if you're just using xgboost itself?
On Wed, Oct 23, 2019 at 2:21 PM Vilmara [email protected] wrote:
Hi Tom, thank you are right, I have dropped the boolean column (which wasn't needed) and it worked. So, what is the issue with dask-xgboost handling boolean columns, where should I report this bug?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/dask-xgboost/issues/57?email_source=notifications&email_token=AAKAOIUFWSORLHLCOTIUEMDQQCP4DA5CNFSM4JEETQP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECCSNEY#issuecomment-545597075, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOITMVMD2DURIW2GFK5TQQCP4DANCNFSM4JEETQPQ .
I am working on multi-node mode, I haven't tried with xgboost itself yet
I'd recommend trying to pass a small subset of your data to a regular xgboost train call to see if it supports boolean columns.
On Wed, Oct 23, 2019 at 2:26 PM Vilmara [email protected] wrote:
I am working on multi-node mode, I haven't tried with xgboost itself yet
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask-xgboost/issues/57?email_source=notifications&email_token=AAKAOIX274AFCYNHNLO6YOTQQCQPRA5CNFSM4JEETQP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECCS4KQ#issuecomment-545599018, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOIXAR5WJDKLNDJFQGK3QQCQPRANCNFSM4JEETQPQ .