pyoptree
pyoptree copied to clipboard
Prediction use wrong values in normalization
Hi! First of all, great work on implementing this!
I was checking your code and I catch an error in prediction. Since I couldn't see the same amount of elements in leaves when I was checking the generated tree, I noticed that in the predict function you use the max and min value of the train dataset instead of the test dataset.
def predict(self, data: pd.DataFrame): if not self.is_trained: raise ValueError("Model has not been trained yet! Please usetrain()` to train the model first!")
new_data = data.copy()
new_data_cols = data.columns
for col in self.P_range:
if col not in new_data_cols:
raise ValueError("Column {0} is not in the given data for prediction! ".format(col))
col_max, col_min = self.normalizer[col] <====== this line
`
So, I just modified to get the maximum and minimum value for the column of the test set:
col_max = max(data[col]) col_min = min(data[col]) # col_max, col_min = self.normalizer[col]
I don't know if this was a mistake or intentional. If it was intentional, why it is this way?
Thanks and, again, great work!
Hi Victor,
Thanks for pointing this out! But the prediction data needs to be normalized as same as the training data, so the col_max and col_min from the training data were used, which is why I used self.normalizer[col].
Thanks, Meng