[FEA] Implement `adjusted_mutual_info_score`
It would be great if adjusted_mutual_info_score were made available in cuml, esp. since there is some evidence that it is preferable to the ARI in the presence of unbalanced cluster sizes, and is slower to calculate.
Looking at https://github.com/scikit-learn/scikit-learn/blob/86906384af8ff337557486d262e88ef3c83c824f/sklearn/metrics/cluster/_supervised.py#L946 I think we have a few of the components that are used to build this already. I've not followed all the helpers used and the helpers used in the helpers, etc. Most of the work is probably around input validation, then about taking care of edge cases when computing the score.
@jpintar Thank you very much for the suggestion.
Hi @jpintar, I’d like to work on implementing adjusted_mutual_info_score in cuML. Looking forward to contributing!
@SumitdevelopAI I've assigned you to the issue. I'd recommend that you propose an implementation strategy before jumping in.
Thanks @csadorf — I’ll take this on. Below is my proposed implementation strategy (staged, reviewable, and designed to match sklearn semantics while enabling a later GPU-optimized EMI).
Proposed implementation strategy for adjusted_mutual_info_score in cuML
High-level goal
Implement cuml.metrics.cluster.adjusted_mutual_info_score(labels_true, labels_pred, *, average_method='arithmetic', handle=None, convert_dtype=True) that is behaviorally compatible with sklearn.metrics.adjusted_mutual_info_score, reusing cuML building blocks and providing a correctness-first CPU reference EMI, followed by a GPU-optimized EMI in a follow-up PR.
Stage A — Correctness-first (single small PR)
Goal: deliver correct results (match sklearn) with minimal reviewer friction.
What I will add
python/cuml/metrics/cluster/adjusted_mutual_info_score.py— Python wrapper implementing the public API and input validation (match sklearncheck_clusteringssemantics).python/cuml/metrics/cluster/_expected_mutual_info_ref.py— CPU reference implementation of EMI using the hypergeometric per-cell recurrence (lgamma init + iterative recurrence). This is correctness-first and easy to review.- Unit tests:
python/cuml/metrics/cluster/tests/test_adjusted_mutual_info.pycomparing outputs tosklearn.metrics.adjusted_mutual_info_scorefor deterministic edge cases and randomized tests (imbalanced clusters included). - Minimal docs / docstring and a short example.
Implementation notes
- Reuse cuML contingency-building and
mutual_info_score/entropy helpers to obtainC, a, b, Nand MI/H(U)/H(V). - EMI computed on CPU initially — exact per-cell sums using:
nmin = max(0, ai + bj - N),nmax = min(ai, bj)P(n+1) = P(n) * ((ai - n)*(bj - n)) / ((n+1)*(N - ai - bj + n + 1))- accumulate
E_MI = Σ_{i,j} Σ_n f(n) * P(n)wheref(n)= per-cell MI term.
- Numeric: use
float64, compute initial P vialgammawhere needed, and adopt sklearn’s small-value clipping for numerator/denominator to avoid 0/0.
Why Stage A first: reviewers can validate correctness easily; users get AMI immediately. Performance work is staged later.
Stage B — GPU EMI (follow-up PR)
Goal: port EMI heavy-lift to CUDA/C++ for speed while preserving numerical semantics.
Candidate kernel designs
- One-block-per-nonempty-cell: each block computes P(n) recurrence for that cell and reduces f(n)*P(n).
- Row-parallel scheme: distribute columns across threads for rows with many columns.
- Use float64 accumulators and reduce with warp/shmem reductions. Start P at modal n where practical to avoid underflow; otherwise compute initial P via log-gamma and iterate.
Fallback heuristics
- If
r * c(unique_true * unique_pred) > threshold (tunable), fallback to CPU EMI or use large-N approximation (AMI ≈ NMI). I will add configurable thresholds & documented behavior.
Deliverables
cpp/src/metrics/expected_mutual_information.cu+ bindings- Python shim:
python/cuml/metrics/cluster/_expected_mutual_info_gpu.py - Microbenchmarks comparing CPU reference and GPU kernel.
Stage C — polish, CI & docs
- Add benchmarks script, performance notes to docs, and CI test selection (fast correctness tests only).
- Add changelog and example notebooks.
Test & validation plan
- Unit tests: deterministic edge cases + randomized seeds (imbalanced & extreme).
- Numerical parity:
np.testing.assert_allclose(cuml_ami, sklearn_ami, rtol=1e-8)(relax if GPU rounding differences appear). - Perf tests: representative medium/large workloads (kept out of baseline CI; optional benchmark job).
Files to change (suggested)
python/cuml/metrics/cluster/adjusted_mutual_info_score.py(wrapper)python/cuml/metrics/cluster/_expected_mutual_info_ref.py(CPU EMI)python/cuml/metrics/cluster/_expected_mutual_info_gpu.py(GPU shim, PR2)cpp/src/metrics/expected_mutual_information.cu(PR2)python/cuml/metrics/cluster/tests/test_adjusted_mutual_info.pybenchmarks/metrics/ami_bench.py(optional)
References / provenance
- Implementation will follow the permutation/hypergeometric formulation used by
sklearn._expected_mutual_info_fastand Romano et al. (recurrence, complexity bounds, large-N approx). I will cite sklearn + the paper in the PR.
If this approach looks good I will prepare Stage A: a small PR with the Python wrapper, CPU EMI reference implementation, tests, and docs. Please let me know if you want a different fallback policy or stricter tolerances in tests — otherwise I’ll prepare the PR and link it to this issue.
@SumitdevelopAI Thanks for the detailed plan. While I appreciate the thoroughness, the proposal is overly verbose and over-engineered for what's ultimately a single metric function. Let's simplify the approach:
Please implement everything in a single PR with the following structure:
-
APIs to add (matching sklearn module structure):
cuml.metrics.cluster.adjusted_mutual_info_score(labels_true, labels_pred, *, average_method='arithmetic')cuml.metrics.cluster.expected_mutual_information(contingency)
-
Implementation approach:
- Implement the functions
adjusted_mutual_info_scoreandexpected_mutual_informationincuml.metrics.cluster._adjusted_mutual_info_scoreandcuml.metrics.cluster._expected_mutual_informationrespectively. - The
adjusted_mutual_info_scorefunction should be importable fromcuml.metrics.clusterandcuml.metrics. Theexpected_mutual_informationshould be importable only fromcuml.metrics.cluster. - Reuse existing cuML components where possible (e.g.,
cuml.metrics.cluster.mutual_info_score()) - Use CuPy for everything else; essentially translate the sklearn implementation to CuPy arrays/operations
- If you encounter CuPy limitations during implementation, please raise them and we can discuss workarounds
- We will evaluate the need for dedicated CUDA kernels and calls into RAFT/cuVS etc. in a second stage
- Implement the functions
-
Testing:
- Compare outputs directly against
sklearn.metrics.adjusted_mutual_info_scoreandsklearn.metrics.cluster.expected_mutual_information - No need for a separate CPU reference implementation within cuML; we can use scikit-learn as the reference
- Compare outputs directly against
Rationale:
- A three-stage approach delays GPU acceleration unnecessarily. Let's deliver a complete, GPU-accelerated solution in one PR.
- CuPy provides GPU acceleration and is straightforward to implement and review. We can evaluate custom CUDA kernels as well as RAFT and cuVS calls later
Thanks @csadorf I’ll follow the simplified approach. I will implement adjusted_mutual_info_score and expected_mutual_information in a single PR (CuPy-based, importable as you described), add tests comparing outputs to sklearn, and open the PR here once ready.
Thanks I’ll get started.
Hi @csadorf
I’ve been busy with my semester exams, but I’m now available again and can continue working on whatever parts of the AMI metric still need to be finalized.
Please let me know the current status and what remaining work is required. I’m ready to help move this issue forward and get it resolved.
@SumitdevelopAI Your previous PR (https://github.com/rapidsai/cuml/pull/7418) had lots of problems. I've pointed them out and closed it. You're welcome to create a new PR to address them if you like.
Hi @csadorf, if you would still like me to implement this, please let me know. I'm a first time contributor and would love to help.
@mani-builds Please feel free to open a PR.