XGBoost-From-Scratch
XGBoost-From-Scratch copied to clipboard
but do you have code for estimated probability of prediction ?
really good code thanks but do you have code for estimated probability of prediction ? as mentioned in https://stats.stackexchange.com/questions/350134/how-does-gradient-boosting-calculate-probability-estimates Per this discussion https://github.com/dmlc/xgboost/issues/5640 it is important to understand in details how this probability is calculated
Hey Sandy,
If I understand your question correctly you are asking how the "XGBoost" model calculates the probability for a single sample. For the binary classification case using Logloss it sums up all of the Logodds in the terminal leaf nodes and then applies a inverse Logit function to squeeze this value to be between 0-1.
I have a worked example of this process here https://medium.com/analytics-vidhya/what-makes-xgboost-so-extreme-e1544a4433bb
great so you did implemented this your code is the best in internet can you share some clue how to find this code in your post and in your repo? thanks you very much will try to learn how your code works did you compared performance with real xgboost?
Hi Sandy,
In the post I would recommend looking at the section “XGBoost” By Hand, as I go a step by step example there. This is what the "XGBoost" predict function looks like.
def predict(self, X):
pred = np.zeros(X.shape[0])
for estimator in self.estimators:
pred += self.learning_rate * estimator.predict(X)
predicted_probas = self.sigmoid(np.full((X.shape[0], 1), 1).flatten().astype('float64') + pred)
preds = np.where(predicted_probas > np.mean(predicted_probas), 1, 0)
return(preds)
You can see that for each sample that we wish to predict we have to loop through and add up the leaf values or predictions from our weak learners. This summed value is actually not a probability yet but the a log odds ratio as we are using log loss for the binary case. In order to turn it into a probability we use the Sigmoid function to squeeze the log odds value between the range of 0-1. Then anywhere the value is greater than the mean probability of all the sample in the datset is given a prediction 1 or a 0. However this step is not necessary.
I honestly haven't compared the results with the real "XGBoost", just a simple cross fold validation on a test dataset to check the accuracy, which I was happy with. I made this more as a learning exercise than a real implementation so I wouldn't advice using it, but the core concepts behind it are the same as the "XGBoost" paper.