forestjs icon indicating copy to clipboard operation
forestjs copied to clipboard

Wrong computations

Open arogozhnikov opened this issue 9 years ago • 0 comments

Hi Andrej,

thanks a lot for writing different ML demos.

When I was building my demo on gradient boosting, I initially thought to take your implementation of trees, but first I reviewed the code...

  1. https://github.com/karpathy/forestjs/blob/master/lib/randomforest.js#L327 entropy function computation is wrong, since p was overwritten.

Fortunately, this isn't important at all - in fact, this summand is always omitted during computations, since it's just a global constant. 2. https://github.com/karpathy/forestjs/blob/master/lib/randomforest.js#L299 computed information gain is wrong. For some reason you compute only impurity while ignoring the number of samples in the leaf (or at least proportions).

So, the correct formula for leaf penalty with entropy is (it's actually log-likelihood, nothing else): n log n - n_{+} log n_{+} - n_{-} log n_{-} to get improvement, subtract from parent's penalty sum of children penalties. You can check that information gain written this way satisfies different basic properties.

Hardly any other algorithm could bear such 'pecularities' of implementation, but random forest works smoothly even in such situation :) Cool, right?

arogozhnikov avatar Jul 13 '16 23:07 arogozhnikov