oneDAL icon indicating copy to clipboard operation
oneDAL copied to clipboard

KMeans Init Sparsity Support

Open md-shafiul-alam opened this issue 8 months ago • 8 comments

Add sparsity support to KMeans Init and fix a few bugs in daal sparse kmeans++ init, onedal kmeans++ init, and kmeans infer. Specific changes planned or made in this PR.

  • [x] Fix distance calculation for sparse data in daal KMeans++
  • [x] Allow oneDAL Kmeans++ init to take n_trials same as daal and scikit-learn
  • [x] Fix difference between daal Kmeans++ dense and sparse results
  • [x] Implement KMeans init sparse support for CPU (just calling daal implementation - cpu)
  • [x] Fix oneDAL KMeans sparse infer on GPU
  • [x] Update Kmeans infer for sparse data to allow result options same as dense

I have verified that

  • [x] Daal kmeans init results are same for sparse and dense data
  • [x] oneDAL kmeans init results are same for sparse and dense data
  • [ ] oneDAL kmeans init results are same on cpu and gpu
    • Not same for dense data unless we compute initial centroids for dense GPU using cpu implementation

md-shafiul-alam avatar Jun 07 '24 13:06 md-shafiul-alam