dmba GainsChart() should include a measure of the total Actual value

GainsChart() should include a measure of the total Actual value

Open MattDBailey opened this issue 3 years ago • 2 comments

I may be mistaken, but the GainsChart() line for a random draw should be based on the total actual values for prediction (or total number of actual occurences for classification) and not the total predicted values:

nActual = gains.sum() # number of desired records

"gains" is the list of predicted values.

Sep 22 '21 01:09 MattDBailey

The issue is not with the function, I see the issue is with passing predicted values instead of actual values (sorted by the predicted values) in the code for the Figure 5.2 on page 132

pred_v = pd.Series(reg.predict(valid_X)) pred_v = pred_v.sort_values(ascending=False)

pred_v needs to be actual actual prices sorted by these predictions when passed to GainsChart()

Sep 22 '21 01:09 MattDBailey

Hello Matt,

You are right, we identified this issue in the book about a year ago and (hopefully) fixed it with the following code. I can see that Wiley hasn't corrected the problem in the electronic version yet. It was also not yet corrected in the code available through the book's website.

Code for Figure 5.2:

# sort the actual values in descending order of the prediction
df = pd.DataFrame({
    'predicted': reg.predict(valid_X),
    'actual': valid_y, 
})
df = df.sort_values(by=['predicted'], ascending=False)

fig, axes = plt.subplots(nrows=1, ncols=2)
ax = gainsChart(df['actual'], ax=axes[0])
ax.set_ylabel('Cumulative Price')
ax.set_title('Cumulative Gains Chart')

ax = liftChart(df['actual'], ax=axes[1], labelBars=False)
ax.set_ylabel('Lift')

plt.tight_layout()
plt.show()

The code for figure 10.3 also needs correcting:

df = logit_result.sort_values(by=['p(1)'], ascending=False)
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(10, 4))

gainsChart(df.actual, ax=axes[0])
liftChart(df.actual, title=False, ax=axes[1])
    
plt.tight_layout()
plt.show()

In this case, the call to liftChart was incorrect.

Sep 22 '21 14:09 gedeck

dmba dmba copied to clipboard

GainsChart() should include a measure of the total Actual value

dmba
dmba copied to clipboard