bico icon indicating copy to clipboard operation
bico copied to clipboard

BICO is a fast streaming algorithm and reduction technique for the k-means problem. It is a python implementation of the original code from http://ls2-www.cs.uni-dortmund.de/bico/ .

BICO (BIrch using COresets)

This is a python implementation of the original BICO <http://ls2-www.cs.uni-dortmund.de/bico/>_ code.

BICO is a fast streaming algorithm for the k-means problem. BICO reduces the number of input points to a specified amount such that the k-means solution on the compressed input set is still a high quality solution to the original data set. BICO combines a tree-like data structure of SIGMOND Test of Time Award winning algorithm BIRCH with reduction techniques from clustering theory.


Applications

Massive Data Sets

Running k-means on a very large data set is even nowadays infeasible. BICO can reduce the amount of input points drastically such that running k-means on the resulting reduced data set is just a matter of seconds. BICO ensures that the solution is of similar quality as the solution on the original input set.

Compression

BICO reduces a set of input points to a specified number of points while preserving the k-means solution quality as good as possible. Thus, BICO is a perfect tool to shrink the size of the data in order to reduce space or bandwidth.

Getting Started

Have a look at this example <https://github.com/gallmerci/bico/blob/master/examples/applications.ipynb>_ for an introduction on how to use this library and some visualizations presenting the results of the computed data reduction.

Slides showing a more technical overview of BICO can be found here <https://github.com/gallmerci/bico/blob/master/docs/BICO_Technical%20Overview.pdf>. If you are interested in the theoretical point of view of BICO, please feel free to check Section 5.4 of this thesis <https://eldorado.tu-dortmund.de/handle/2003/34099> which contains a very detailed description of the algorithm including all proofs of theoretical guarantees.