Crash on Windows when `CUDA_VISIBLE_DEVICES` is set to `-1`
In a project where we combine XGBoost with Tensorflow within the same process, we ran into the following issue:
When the environment variable CUDA_VISIBLE_DEVICES is set to -1, the XGBoost predict step function crashes after about a minute of predicting. Strangely enough, it seems to happen stochastically. The crash only occurs after predicting for a while, either by setting nthread to a low value, or by repeating the same predict step many times. Doing the predict step once usually works without the crash, but not always.
The crash does not produce any error messages and only happens on Windows, as far as I can tell.
Here's a script to reproduce:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
from rich.progress import track
import xgboost as xgb
def main():
features = xgb.DMatrix("features.buffer")
model_filenames = ["model_20210416_HCD2021_B.xgboost"]
prediction_list = []
for model_filename in track(model_filenames * 100):
xgb_model = xgb.Booster(model_file=model_filename)
prediction_list.append(xgb_model.predict(features))
print("Done without crashes!")
if __name__ == "__main__":
main()
Comment out the first two lines makes it work again.
pip freeze output:
numpy==2.0.2
scipy==1.13.1
xgboost==2.1.4
And with optional rich install for the progress bar (does not change the crash behavior):
markdown-it-py==3.0.0
mdurl==0.1.2
numpy==2.0.2
Pygments==2.19.1
rich==13.9.4
scipy==1.13.1
typing_extensions==4.12.2
xgboost==2.1.4
Files used: https://1drv.ms/u/c/cc884c602a30d109/ET6oclsK3PpLqnj6p4W0h40BU2vIMXQzQnOWRLl5SfecFw?e=eCoexz
May I ask why do you need to set that environment variable to -1?
It's part of a dependency that uses TensorFlow and which is used before the dependency that uses XGBoost. In short, it's a pipeline that combines multiple machine learning predictors, each with their own purpose.
As a simple workaround we can definitely remove the environment variable before predicting with XGBoost. Nevertheless, it seemed sensible to report the issue.
Thank you for sharing, I will try to look into it. Not familiar with debugging on Windows ...