php-kmeans icon indicating copy to clipboard operation
php-kmeans copied to clipboard

Resume algorithm execution

Open bdelespierre opened this issue 2 years ago • 6 comments

I believe it would be nice to be able to resume algorithm execution after its completion. It could be useful as new points are being added so previous iterations don't need to be re-run again.

Example: I have clustered my 100 000 users into 5 clusters. Since the last clustering, 100 new users have been added. Most of them are probably already very close to the existing clusters' centroids. Hence, I should be able to resume clustering the same dataset PLUS the new users to save time.

bdelespierre avatar Sep 03 '21 21:09 bdelespierre

It would be good to provide this function as an option. Because the added points will also affect the creation of the cluster. Clustering 100000 users and clustering 100100 users may have different results. So it would be nice to have 2 options when using the library.

  1. After 100000 users are clustered, 100 additional users are clustered
  2. Re-clustering 100100 users

battlecook avatar Sep 06 '21 07:09 battlecook

Yes. I would propose something like:

$algo = new Kmeans\Algorithm:(new Kmeans\RandomInitialization());

$result = $algo->clusterize($points, $nbClusters);

$serialized = serialize($result);

// later...

$previousRun = unserialize($serialized);

$result = $previousRun->resume($newPoints);

bdelespierre avatar Sep 06 '21 10:09 bdelespierre

looks good 👍

battlecook avatar Sep 06 '21 10:09 battlecook

I've been thinking about a result object for Algorithm::clusterize. What do you think of this API?

<?php

namespace Bdelespierre\Kmeans\Interfaces;

interface ClusterizationResultInterface extends \Serializable
{
    public function hasReachedConvergence(): bool;

    /**
     * @return int<0, max>
     */
    public function iterationsCount(): int;

    public function getClusters(): ClusterCollectionInterface;

    public function resume(PointCollectionInterface $newPoints): self;
}

bdelespierre avatar Sep 06 '21 10:09 bdelespierre

Sorry for checking late. (I confirmed that it was committed to pr.)

I think it's fine. But I think we'll have to do some more work to be more confident about the interface design.

battlecook avatar Sep 08 '21 14:09 battlecook

It's not implemented in #27. I plan to implement that later

bdelespierre avatar Sep 09 '21 11:09 bdelespierre