Nicolas Hug
Nicolas Hug
Instead of summing over all the bins of an arbitrary feature histogram to compute `context.sum_gradients` and `context.sum_hessians`, we can instead directly pass those values to `find_node_split_subtraction` since the parent's `split_info`...
I think we can drop the `constant_hessian_value` in the `SplittingContext`, and always assume the constant hessians value is `1`. We just have to set the gradients value accordingly, to have...
will investigate
A new grower (and a new SplittingContext) is created at each iteration which may cause a memory use peak on large datasets (#79). Instead of instanciating a new grower, we...
https://github.com/numba/numba/issues/3554 was fixed so we can remove our temporary fix from #51 once the next version is released. Places to fix (ATM): ``` ~/dev/pygbm » ag "array\[:0\] will" pygbm/splitting.py 250:...
Slightly related to #76 This is the second bullet point from https://github.com/ogrisel/pygbm/issues/69#issue-391170726 When early stopping (or just score monitoring) is done on the training data with the loss, we should...
As mentioned in #75, it'd be nice to allow score monitoring (both scoring and loss values on train / validation data) regardless of early stopping.
Opened https://github.com/numba/numba/issues/3588 to ask if there's an alternative.
Results are comparable to LightGBM when `n_samples` `n_bins`. In particular, on this very easy dataset (`target = X[:, 0] > 0`, lightgbm finds a perfect threshold of `1e-35` while that...
Following #214, there might be some numerical instability in pearson similarity computation, probably caused by the `sqrt` function receiving either negative values, NaN or infinity. I couldn't reproduce the issue...