forestjs
forestjs copied to clipboard
Wrong computations
Hi Andrej,
thanks a lot for writing different ML demos.
When I was building my demo on gradient boosting, I initially thought to take your implementation of trees, but first I reviewed the code...
- https://github.com/karpathy/forestjs/blob/master/lib/randomforest.js#L327
entropy function computation is wrong, since
pwas overwritten.
Fortunately, this isn't important at all - in fact, this summand is always omitted during computations, since it's just a global constant. 2. https://github.com/karpathy/forestjs/blob/master/lib/randomforest.js#L299 computed information gain is wrong. For some reason you compute only impurity while ignoring the number of samples in the leaf (or at least proportions).
So, the correct formula for leaf penalty with entropy is (it's actually log-likelihood, nothing else): n log n - n_{+} log n_{+} - n_{-} log n_{-} to get improvement, subtract from parent's penalty sum of children penalties. You can check that information gain written this way satisfies different basic properties.
Hardly any other algorithm could bear such 'pecularities' of implementation, but random forest works smoothly even in such situation :) Cool, right?