scikit-learn-intelex
scikit-learn-intelex copied to clipboard
[Question/Bug] Out of memory issues while building inference
Describe the bug Every time I try to convert my xgboost model (using daal4py.get_gbt_model_from_xgboost) I'm running out of memory. Moved training script from my local machine to GCP, kept increasing resources and finally gave up at n1-highmem-32 (204GB of RAM). Is there any way to reduce number of workers or... to be able to accomplish that task? I understand that my model is quite complex, but 204GB of RAM should be sufficient to perform that kind of task.
To Reproduce Work in progress
Expected behavior Being able to build daal4py inference of my xgboost model having limited resources (e.g. 102GB of RAM).
Output/Screenshots None
Environment: GCP n1-highmem-32 machine. daal4py installed using pip.
Hi tizycki,
We checked the memory consumption of the model converter by saving all the necessary information to a file and performing a clean conversion (without importing xgboost). The file size for a quite large model of ours came out a little less than 20MB.
You can try to get the same file for your workload by doing model_xgb.dump_model(filename) and just checking its size.
Several runs showed slightly higher than 2x (relative to the size of the saved model) RAM consumption throughout the conversion process, which is quite expected - we read the file and make a similar model in the oneDAL representation.
Could you please share more details about your workload? Your dataset and xgboost training parameters would be good to start with
@RukhovichIV
My model saved using model_xgb.save_model in xgb format has 48.2MB, saved as pickle 91.6MB.
Config of classifier is the following:
XGBClassifier(base_score=0.5, booster='gbtree',
colsample_bylevel=0.4503841871781403, colsample_bynode=1,
colsample_bytree=0.9195352964526833, eval_metric='logloss',
gamma=8.168958221061441e-09, gpu_id=-1, importance_type='gain',
interaction_constraints='', learning_rate=0.07356404539935663,
max_delta_step=5, max_depth=91, min_child_weight=2, missing=nan,
monotone_constraints='()', n_estimators=388, n_jobs=-1,
num_parallel_tree=1, random_state=0,
reg_alpha=0.00010376808625045426, reg_lambda=476.96194787286544,
scale_pos_weight=1.3165669602830552, subsample=0.387658500562527,
tree_method='approx', use_label_encoder=False,
validate_parameters=1, verbosity=None)
I cannot share a model and dataset because of sensitive data, but I'll try to create synthetic dataset to recreate this issue.
I can add as well that this daal4py inference works while training a classifier on a small sample of dataset. Therefore it's not environment issue.
You can try reducing number of threads used for inference by setting daalinit
import daal4py daal4py.daalinit(nthreads=2)
Good point to start would be to limit it by amount of physical cores on system