mlfactor.github.io icon indicating copy to clipboard operation
mlfactor.github.io copied to clipboard

Ensemble models

Open immortal678 opened this issue 3 years ago • 3 comments

In example 11.1.2, I am not able to understand the calculation of weights on the unconstrained ensemble, if it is based on the training MAEs of the models, then RF > Pen reg > NN > and so on but the weights are not in line with this. I understand it is related to the correlation within the techniques, but if you can please elaborate it, I will highly appreciate that!. Thanks in advance!

immortal678 avatar Apr 28 '21 15:04 immortal678

I copy-pasted the weights:

Pen_reg -0.584393293

Tree -0.074509616

RF 1.331785969

XGB -0.001696782

NN 0.328813723

So indeed RF is way above all others. The problem is that to give more weight to RF, because weights sum to one, the ensemble must "short" other models. And Pen_reg is the chosen short leg. I guess one important driver which is not shown in the example is the variance of errors. I suppose that the variance of Pen_reg errors are higher than those of NN, which is why they are penalized (!) in the ensemble weights. I leave it to you to confirm that! (hopefully)

shokru avatar Apr 28 '21 16:04 shokru

Hi, If you can please give an outside reference of the optimized ensemble used in the book, I will highly appreciate that!

immortal678 avatar May 24 '21 11:05 immortal678

There is no outside reference. Ensembling (like ML in general) is a cooking recipe. In the book we test several recipes and in this chapter, they do not work too well probably because the original models are too correlated (hence the y_tilde tell the same story, thus learning from them from different perspectives is bound to fail).

Sorry for the disappointment...

shokru avatar May 24 '21 11:05 shokru