Julien Jerphanion comments

Results 438 comments of


                                            Julien Jerphanion

FEA Fused sparse-dense support for `PairwiseDistancesReduction`

I've extracted part of https://github.com/jjerphan/scikit-learn/pull/15 in https://github.com/scikit-learn/scikit-learn/pull/24223 to solve the problems appearing on some configuration there and make the diff of this PR smaller.

FEA Fused sparse-dense support for `PairwiseDistancesReduction`

Benchmarks results against `main` after merging #24223 in this branch using using https://github.com/jjerphan/scikit-learn/commit/2106e02c via `benchmarks/maint/pdr-sparse-support` on a machine with 128 physical cores using 128 threads. Note that: - `'manhattan'` was...

FEA Fused sparse-dense support for `PairwiseDistancesReduction`

> Very nice. Could you please push the merge of https://github.com/scikit-learn/scikit-learn/pull/24223 to see the big picture and be able to concurrently review/test this PR? ~~Shouldn't we wait for @thomasjpfan's review...

FEA Fused sparse-dense support for `PairwiseDistancesReduction`

Additional benchmarks results with `metric="euclidean"` and `np.float32` data via https://github.com/jjerphan/scikit-learn/commit/5bcac125 on a machine with 32 physical cores using 32 threads. We get up to at least ×100 speed-ups. ``` before...

FEA Fused sparse-dense support for `PairwiseDistancesReduction`

https://github.com/scikit-learn/scikit-learn/pull/24272 has been opened to resolve test failures observed in this PR.

MAINT Parameters validation for `SimpleImputer`

Using the constraint for `nan` and friends introduced by https://github.com/scikit-learn/scikit-learn/pull/24007 should resolve this issue.

FIX Correct `GaussianMixture.weights_` normalization

To me, we should choose a normalizer term such that after normalization `np.abs(np.sum(self.weights_) - 1.0)` is minimized. I think normalizing by the sum of weights makes sense in this case....

FIX Correct `GaussianMixture.weights_` normalization

@kasmith11: Yes, definitely: it would be helpful!

Weights are being normalized using number of samples as opposed to sum in GaussianMixture

Hi @kshitijgoel007, thank you for reporting this. Do you by change have a minimal reproducible example where this incorrect weighting cause problem?

MNT Handle NaNs in scipy dev rankdata

OK, I added them regarding @lesteve's last remark: > Note: the change in scipy breaks backward-compatibility for the rank in the cv_results_ attribute. For nan scores, the associated rank will...