Nicolas Hug issues

Results 102 issues of


                                            Nicolas Hug

sum_gradient and sum_hessians computation in find_node_split_subtraction

Instead of summing over all the bins of an arbitrary feature histogram to compute `context.sum_gradients` and `context.sum_hessians`, we can instead directly pass those values to `find_node_split_subtraction` since the parent's `split_info`...

perf

Remove constant_hessian_value?

I think we can drop the `constant_hessian_value` in the `SplittingContext`, and always assume the constant hessians value is `1`. We just have to set the gradients value accordingly, to have...

Updating to Scipy 1.2.0 breaks loss tests...

will investigate

Reuse grower (and thus the splitter) instead of creating a new one

A new grower (and a new SplittingContext) is created at each iteration which may cause a memory use peak on large datasets (#79). Instead of instanciating a new grower, we...

perf

Remove empty slice check (numba fixed the issue)

https://github.com/numba/numba/issues/3554 was fixed so we can remove our temporary fix from #51 once the next version is released. Places to fix (ATM): ``` ~/dev/pygbm » ag "array\[:0\] will" pygbm/splitting.py 250:...

Optimize score loss computation

Slightly related to #76 This is the second bullet point from https://github.com/ogrisel/pygbm/issues/69#issue-391170726 When early stopping (or just score monitoring) is done on the training data with the loss, we should...

perf

Allow score monitoring regardless of early stopping

As mentioned in #75, it'd be nice to allow score monitoring (both scoring and loss values on train / validation data) regardless of early stopping.

enhancement

Use cache=True on everything?

Opened https://github.com/numba/numba/issues/3588 to ask if there's an alternative.

perf

Enhance binning strategy

Results are comparable to LightGBM when `n_samples` `n_bins`. In particular, on this very easy dataset (`target = X[:, 0] > 0`, lightgbm finds a perfect threshold of `1e-35` while that...

Potential numerical instability in pearson sim

Following #214, there might be some numerical instability in pearson similarity computation, probably caused by the `sqrt` function receiving either negative values, NaN or infinity. I couldn't reproduce the issue...

help wanted