rusty-machine icon indicating copy to clipboard operation
rusty-machine copied to clipboard

ENH: implement Agglomerative (hierarchical) clustering

Open sinhrks opened this issue 7 years ago • 4 comments

This needs a discussion, because hierarchical clustering uses the same data for training and prediction. It doesn't meet very well with current SupModel trait.

Maybe we should have train_predict method to use same data for training and prediction?

sinhrks avatar Dec 20 '16 14:12 sinhrks

Awesome! Thank you so much for these PRs - sorry that it's taking me a little while to look through them properly as well. This is another algorithm I'm not super familiar with and so will need a little time to understand it.

There are definitely some issues with the current traits for learning. We've had discussions ( #124 ) about these before but it is hard to agree on the best. If you have anything to add I would love to hear your opinion.

As for this particular example, could you elaborate a little more on what exactly is lacking in the trait signature. From a brief look it seems that you train the model and return the clusters. The way I have handled this elsewhere is to use the UnSupModel trait and store the clusters within the model, providing an access function. And then the predict function will assign the new points to their closest cluster. This is a little messy but keeps things consistent until we can find a nicer way to link models and traits together.

AtheMathmo avatar Dec 20 '16 14:12 AtheMathmo

You're right. Renamed Metrics to Linkage.

sinhrks avatar Dec 20 '16 23:12 sinhrks

Sorry I forgot to comment on this again.

Thanks for making those changes - I think that this is probably ready to be merged now. Before I do merge I'd like to have a play with it and convince myself it is working as expected. Do you have any recommendations on how I can do this? Any data sets that lend themselves to this kind of model?

Edit: If you have no other suggestions then I'll merge #161 and try it out on the iris data set.

AtheMathmo avatar Dec 24 '16 16:12 AtheMathmo

iris is ok as test data. Here is R example( https://cran.r-project.org/web/packages/dendextend/vignettes/Cluster_Analysis.html). I'm adding tests once #161 has been merged.

Note: current test data is derived from a Japanese textbook in R.

Another point is training API. Currently, it doesn't impl UnSupModel. What do you think to change UnSupModel to:

    pub trait UnSupModel<T, U> {
        /// Predict output from inputs.
        fn predict(&self, inputs: &T) -> LearningResult<U>;

        /// Train the model using inputs.
        fn train(&mut self, inputs: &T) -> LearningResult<()>;

        /// Train the model using inputs, then predict outputs from the same inputs.
        fn train_predict(&mut self, inputs: &T) -> LearningResult<U> {
            self.train(inputs).unwrap();
            self.predict(inputs)
    }

AgglomerativeClustering should raise unimplemented! in predict/train. And should have custom impl for train_predict.

sinhrks avatar Dec 25 '16 01:12 sinhrks