Julien Jerphanion
Julien Jerphanion
I've extracted part of https://github.com/jjerphan/scikit-learn/pull/15 in https://github.com/scikit-learn/scikit-learn/pull/24223 to solve the problems appearing on some configuration there and make the diff of this PR smaller.
Benchmarks results against `main` after merging #24223 in this branch using using https://github.com/jjerphan/scikit-learn/commit/2106e02c via `benchmarks/maint/pdr-sparse-support` on a machine with 128 physical cores using 128 threads. Note that: - `'manhattan'` was...
> Very nice. Could you please push the merge of https://github.com/scikit-learn/scikit-learn/pull/24223 to see the big picture and be able to concurrently review/test this PR? ~~Shouldn't we wait for @thomasjpfan's review...
Additional benchmarks results with `metric="euclidean"` and `np.float32` data via https://github.com/jjerphan/scikit-learn/commit/5bcac125 on a machine with 32 physical cores using 32 threads. We get up to at least ×100 speed-ups. ``` before...
https://github.com/scikit-learn/scikit-learn/pull/24272 has been opened to resolve test failures observed in this PR.
Using the constraint for `nan` and friends introduced by https://github.com/scikit-learn/scikit-learn/pull/24007 should resolve this issue.
To me, we should choose a normalizer term such that after normalization `np.abs(np.sum(self.weights_) - 1.0)` is minimized. I think normalizing by the sum of weights makes sense in this case....
@kasmith11: Yes, definitely: it would be helpful!
Hi @kshitijgoel007, thank you for reporting this. Do you by change have a minimal reproducible example where this incorrect weighting cause problem?
OK, I added them regarding @lesteve's last remark: > Note: the change in scipy breaks backward-compatibility for the rank in the cv_results_ attribute. For nan scores, the associated rank will...