[QST] How to use GPU to load trained Random Forest model and predict?
What is your question? Hi,
I have pretrained several Random Forest (RF) models using cuRFC. I need to iterate through these models to make predictions and add the results to a DataFrame. However, the process is currently very slow. (iterate 2233 models need more than 3 hrs) I am wondering if there is an API available to ensure that I am using the GPU to accelerate the prediction process? (A100 40G x2).
` def pre_read_model(layer_count):
print('preread_'+str(layer_count))
layer=layer_count
folder_path = f'./Layer_{layer}/'
model_files = [f for f in os.listdir(folder_path) if f.endswith('.pkl')]
model_files.sort()
models = {}
print('load model')
for model_file in model_files:
with open(f"{folder_path}/{model_file}", "rb") as f:
models[model_file] = pickle.load(f)
models = dict(sorted(models.items()))
return models`
` def Add_NF(data, task_list, problem_mode, layer_count,c,workers)
data=data.compute()
layer=layer_count-1
raw=data.copy()
folder_path = f'./Layer_{layer}/'
model_files = [f for f in os.listdir(folder_path) if f.endswith('.pkl')]
model_files.sort()
predict_features = data.drop(columns=task_list)#只保留ecfp and transfer feature
global prob
for l in range(1,layer_count):
model_ = globals()['Layer_'+str(l)+'_models']
for model_file in tqdm(model_files):
model = model_[model_file]
prob = model.predict_proba(predict_features)[1] # 假設取出第二列的概率
new_feature_name = f'new_feature_layer_{l}_feature_by_{model_file}'
raw[new_feature_name] = prob
del model
del prob`
Thanks for the issue @m946107011. I have a quick question, first one being what is the data size in each of the models.
From the code I think each prediction is indeed running in the GPU, but I don't know if this is a great fit currently. Iterating throigh so many models will have a significant overhead when compared to singular large model predictions. That said, @hcho3 might be a good person to give some feedback for parallel tree inference like this.
Thank you for your quick reply, @dantegd. The largest model is 100 MB, and the smallest is 852 KB. For the dataset, I use the HFS file format; the largest file is 61 MB, and the smallest is 35 KB.
RH