I want to construct Bootstrap Confidence Intervals for XGBoost regression using python. I developed my case based on codes (https://machinelearningmastery.com/calculate-bootstrap-confidence-intervals-machine-learning-results-python/#comment-528118). Question: I am getting a one bin histogram. I get the single value for the score when we do n_iterations for the bootstrap. This is the problem and it is related to the way I am getting RMSE. Though I tried to find RMSE in different ways. yet, I could not solve the problem How can we solve it?

import numpy from pandas import read_csv from sklearn.datasets import load_boston from sklearn.utils import resample from matplotlib import pyplot from xgboost import XGBRegressor import pandas as pd import numpy as np from sklearn.metrics import mean_squared_error

load dataset

boston_dataset = load_boston()

df = pd.DataFrame(boston_dataset.data, columns=boston_dataset.feature_names)

df['MEDV'] = boston_dataset.target values1 = df.values

configure bootstrap

n_iterations = 1000 n_size = int(len(df) * 0.50)

run bootstrap

stats = list()

# prepare train and test sets

for i in range(n_iterations): # prepare train and test sets train = resample(values1, n_samples=n_size) test = numpy.array([x for x in values1 if x.tolist() not in train.tolist()])

model = XGBRegressor() ## Final for the papers

X_train = train[:,:-1] y_train = train[:,-1] X_test = test[:,:-1] y_test = test[:,-1]

model.fit(X_train,y_train) predictions = model.predict(X_test) # make predictions

def rmse_calculator(predicted, actual):

assert len(predicted) == len(actual)
return np.sqrt(
            np.mean(
                np.power(predicted- actual, 2)))
score = rmse_calculator(y_test , predictions)

#score = mean_squared_error(y_test, predictions) ** 0.5 yt = np.asarray(y_test) y_pred = np.asarray(predictions) score = np.sqrt(mean_squared_error(yt,y_pred)) print(score) stats.append(score)

plot scores

pyplot.hist(stats) pyplot.show()

confidence intervals

alpha = 0.95 p = ((1.0-alpha)/2.0) * 100 lower = max(0.0, numpy.percentile(stats, p)) p = (alpha+((1.0-alpha)/2.0)) * 100 upper = min(1.0, numpy.percentile(stats, p)) print('%.1f confidence interval %.1f%% and %.1f%%' % (alpha100, lower100, upper*100))

Apr 02 '20 14:04 Shafi2016

Try plotting the data to confirm there is a distribution. Perhaps there is not.

If there is, try changing the number of bins in the histogram plot.

Apr 02 '20 19:04 jbrownlee

Thanks a lot: Yes tried to change the number of bins but it did not work as:

sns.distplot(stats, hist=True, kde=False, bins=int(30/2), color = 'blue', hist_kws={'edgecolor':'black'})

I checked with XGBoost Classifier with the data (https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv). It works fine.

import numpy from pandas import read_csv from sklearn.utils import resample from xgboost import XGBClassifier from sklearn.metrics import accuracy_score from matplotlib import pyplot

load dataset

data = read_csv('pima-indians-diabetes.data.csv', header=None) values = data.values

configure bootstrap

n_iterations = 100 n_size = int(len(data) * 0.50)

run bootstrap

stats = list() for i in range(n_iterations): # prepare train and test sets train = resample(values, n_samples=n_size) test = numpy.array([x for x in values if x.tolist() not in train.tolist()]) # fit model model =XGBClassifier() model.fit(train[:,:-1], train[:,-1]) # evaluate model predictions = model.predict(test[:,:-1]) score = accuracy_score(test[:,-1], predictions) print(score) stats.append(score)

plot scores

pyplot.hist(stats) pyplot.show()

confidence intervals

alpha = 0.95 p = ((1.0-alpha)/2.0) * 100 lower = max(0.0, numpy.percentile(stats, p)) p = (alpha+((1.0-alpha)/2.0)) * 100 upper = min(1.0, numpy.percentile(stats, p)) print('%.1f confidence interval %.1f%% and %.1f%%' % (alpha100, lower100, upper*100))

Apr 02 '20 20:04 Shafi2016

I also ploted the histogram of Prediction (XGBoost regression) It seems fine:

Apr 02 '20 21:04 Shafi2016

Hi, I have this error for classifier 'continuous is not supported' How can I solve it ?

May 16 '22 09:05 yahmadyar95

Hi Dmlc/Xgboost,

Thanks for asking.

I’m eager to help, but I just don’t have the capacity to debug code for you.

I am happy to make some suggestions:

Consider aggressively cutting the code back to the minimum required. This will help you isolate the problem and focus on it.
Consider cutting the problem back to just one or a few simple examples.
Consider finding other similar code examples that do work and slowly modify them to meet your needs. This might expose your misstep.
Consider posting your question and code to StackOverflow.

Regards,

Jason Brownlee, Ph.D. Making Developers Awesome at Machine Learning

Do you need help with machine learning? Visit: MachineLearningMastery.com http://machinelearningmastery.com/

On Mon, May 16, 2022 at 5:41 AM yahmadyar95 @.***> wrote:

Hi, I have this error for classifier 'continuous is not supported' How can I solve it ?

— Reply to this email directly, view it on GitHub https://github.com/dmlc/xgboost/issues/5475#issuecomment-1127452785, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAADEWZDJTLGX3WOFWRDHW3VKIJ3FANCNFSM4L2P7RMQ . You are receiving this because you commented.Message ID: @.***>

May 16 '22 23:05 jbrownlee

Bootstrap Confidence Intervals for XGBoost regression (Python)

load dataset

configure bootstrap

run bootstrap

plot scores

confidence intervals

load dataset

configure bootstrap

run bootstrap

plot scores

confidence intervals