Andrei Novikov comments

Results 19 comments of


Andrei Novikov

correcting rock

Hello @JosephChataignon , Thank you for the pull request, I will take a look at the correction, run tests for python code (your PR has been checked by integration tests...

A way to know the progress

Hello, @Nithanaroy, Currently, the library does not provide such information. Good point to implement it. I will provide such service.

Sampling Size

Hello, @devshah96 , Clustering results depend on input parameters. I don't know how your data looks like, but in case of complex data it is not a trivial task to...

Sampling Size

@devshah96, each point is considered as a separate cluster at the begining and merge them step by step. Algorithm does not provide **random sampling and partitioning** feature if you are...

Sampling Size

@devshah96 , looks like it is a good point to support this feature.

Better PAM initialization with BUILD

Hello @kno10 , There is a kmeans++ based algorithm to initialize it on which you are referring in your article: > In the experiments, we will also study whether a...

Better PAM initialization with BUILD

Thank you for the clarification. One more question: > I tried using your k-means++ initialization, but I believe it only accepts a data matrix, not a distance matrix, as input....

Better PAM initialization with BUILD

Hello @kno10 , I have introduced PAM BUILD algorithm in line with your article and I have optimized C++ version (that should be used by default) of K-Medoids (PAM) algorithm....

Better PAM initialization with BUILD

@kno10 , I would be really pleased if you provide test code that you was using for performance testing.

Better PAM initialization with BUILD

@kno10 , I decided to repeat your experiment using `sklearn.datasets.fetch_20newsgroups_vectorized`. Results of the original PAM without distance matrix (list of points) still aren't good: ``` Medoids: [6953, 10558, 489, 1034,...