numphp icon indicating copy to clipboard operation
numphp copied to clipboard

Convert kmeans from numpy to numphp

Open bluescreen opened this issue 6 years ago • 6 comments

Hi, i tried to port a kmeans algorithm from python to php using your numpy like library. Somehow I do not fully understand how to operate matrices correctly using your lib. Are the Parameters axis, keepdims of sum compatible to numpy? Can you help me get this code running with your lib? I put the Python version into comments.

function kmeans(NdArray $X, $K, $maxIter = 20, $beta = 1.0){
    list($N, $D) = $X->shape;

    $R          = np::zeros($N, $K); // Responsibility Matrix
    $exponents  = np::nulls($N, $K); // Exponents

    // initialize M with random values of X
    $M = randomize($X, $K, $N);


    for($i=0; $i< $maxIter; $i++){
        // step 1: determine assignments / resposibilities
        for($k=0; $k < $K; $k++){
            for($n=0; $n < $N; $n++){

                // Python:  exponents[n,k] = np.exp(-beta*d(M[k], X[n]))
                $exponents[$n][$k] = np::exp(-$beta * distance($M[$k], $X[$n]));

                // Python: R = exponents / exponents.sum(axis=1, keepdims=True)
                $R = $exponents->divide($exponents->sum());
            }
        }

        // step 2: recalculate means
        for($k=0; $k < $K; $k++){

            // Python: M = R.T.dot(X) / R.sum(axis=0, keepdims=True).T
            $M[$k] = np::transpose($R)->dot($X)->divide(np::transpose($R->sum()));
        }
    }

    echo $M; die;
}

function randomize(NdArray $X, $K, $N)
{
    $randomMeans = [];
    for($k=0;$k < $K; $k++){
        $randomMeans[$k] = $X[rand(0, $N-1)]->data;
    }
    return np::ar($randomMeans);
}

bluescreen avatar Jun 21 '19 11:06 bluescreen

Hi @bluescreen,

Several modifications have been made to help a k-mean implementation.

  • Axis and keepdims parameters are now supported for the sum() method.
$a = np::arange(9)->reshape(3, 3);
print $a;

// sum all
print $a->sum() . "\n";

// sum axis 0
print $a->sum(0);

// sum axis 1
print $a->sum(1);

// sum all and keepdims
print $a->sum(null, true);

// sum axis 0 and keepdims
print $a->sum(0, true);

// sum axis 1 and keepdims
print $a->sum(1, true);

// Another way is to call sum() method as a np utility and give it a
// matrix as first parameter.
// Below, it's a sum on axis 0 with keepdims = true
print np::sum($a, 0, true);

Would output:

[[0  1  2]
 [3  4  5]
 [6  7  8]]
36
[9   12  15]
[3   12  21]
[[36]]
[[9   12  15]]
[[3 ]
 [12]
 [21]]
[[9   12  15]]
  • Linalg norm() method
$a = np::ar([0, 1, 0]);
$b = np::ar([1, 1, 0]);

print $a;
print $b;

// Euclidean distance 
print np::linalg()->norm(
    $b->subtract($a)
);

Would output:

[0  1  0]
[1  1  0]
1
  • power() method
$a = np::ar([sqrt(2), 2, 3]);

print $a;

// square a matrix, element-wise
print $a->power(2);

Would output:

[1.4142135623731  2                3              ]
[2  4  9]
  • Array and matrix indexes now start at index 0
$a = np::arange(9)->reshape(3, 3);
print $a;

// print first element. both will output 0
print $a['0,0']; 
print $a->data[0][0];

// print last element. both will output 8
print $a['2,2']; 
print $a->data[2][2];

All these updates are available with composer require sciphp/numphp:dev-master and will be packaged with the next release (0.3.0).

SciPhp manual is not yet updated but will be soon.

Hope this helps.

landrok avatar Jul 01 '19 08:07 landrok

Thanks a lot. That will help. Only the np.random functions are missing but I think I can implement them.

And I suggest to keep the phpunit at version 7 in composer.json or update the tests. Otherwise there are lots of deprecated messages.

The @expectedException, @expectedExceptionCode, @expectedExceptionMessage, and @expectedExceptionMessageRegExp annotations are deprecated. They will be removed in PHPUnit 9. Refactor your test to use expectException(), expectExceptionCode(), expectExceptionMessage(), or expectExceptionMessageRegExp() instead.

bluescreen avatar Jul 01 '19 15:07 bluescreen

There also seems to be missing a case for dividing matrices vertically. When I try

$X = np::ar([[1,2,3],
                     [4,5,6],
                     [7,8,9]]);
$Y = np::ar([[1],[2],[3]]);
echo $X->divide($Y);

It throws InvalidArgumentException : Matrices are not aligned.

Result should be: 
 [[1.         2.         3.        ]
 [2.         2.5        3.        ]
 [2.33333333 2.66666667 3.        ]]

Could be implemented like this (not so efficient)

elseif ($m->ndim === $n->ndim && $shape_n[1] == 1) {

        list($cols, $rows) = $m->shape;
        $result = self::zeros($cols, $rows);
        for ($i = 0; $i < $cols; $i++) {
            for ($j = 0; $j < $rows; $j++) {
                $result["$i,$j"] = $m["$i,$j"] / $n["$i,0"];
            }
        }
        return $result;
    }

bluescreen avatar Jul 01 '19 18:07 bluescreen

NumPhp Manual

sum(), power() and indexing updates have been documented on https://sciphp.org

PHPUnit 8 warnings

  • Exceptions are now binded with the recommended functions
  • PHP 7.1+ uses assertEqualsWithDelta() instead of assertEquals() when appropriated (PHPUnit 7.5 and newest)
  • PHPUnit 5.7 and 6.5 are still supported, respectively for PHP 5.6 and PHP 7.0

Broadcasting

  • A np::broadcast_to() utility has been pushed and integrated to handle the missing case.
np::ar(
    [[1, 2, 3],
     [4, 5, 6],
     [7, 8, 9]]
)->divide(
    [[1],
     [2],
     [3]]
);

is equivalent to

np::ar(
    [[1, 2, 3],
     [4, 5, 6],
     [7, 8, 9]]
)->divide(
    [[1, 1, 1],
     [2, 2, 2],
     [3, 3, 3]]
);

Would return, as expected

[[1                2                3              ]
 [2                2.5              3              ]
 [2.3333333333333  2.6666666666667  3              ]]

np::broadcast_to() is not yet documented as it has to handle more cases and integrated in some other methods.

Thanks for this nice catch 👍

landrok avatar Jul 03 '19 07:07 landrok

Great! Now only np.random functions are missing for the library to be used for machine learning stuff! Thanks a lot!

bluescreen avatar Jul 03 '19 12:07 bluescreen

You're right, random features are missing.

I saw that you implemented an interesting RandomTrait into bluescreen/numphp.

If you're interested in contributing to numphp, I would be very appreciative of it.

There are some little adjustments to make:

Random namespace

Random namespace should be SciPhp\Random, the RandomGenerator would become SciPhp\Random class.

Methods

Their implementations could be the same as LinAlg ones with a dedicated SciPhp/Random/ folder.

  • random() could be put there https://github.com/sciphp/numphp/blob/master/src/SciPhp/NumPhp/ExtensionsTrait.php
  • A SciPhp\Random wrapper like https://github.com/sciphp/numphp/blob/master/src/SciPhp/LinAlg.php
  • A SciPhp\Random\Decorator decorator to load traits https://github.com/sciphp/numphp/blob/master/src/SciPhp/LinAlg/Decorator.php
  • And then, the traits could reside in SciPhp\Random

If you want to discuss about other implementations details, feel free to contact me on https://cybre.space/@landrok

landrok avatar Jul 11 '19 22:07 landrok