rusty-machine
rusty-machine copied to clipboard
ENH: implement Agglomerative (hierarchical) clustering
This needs a discussion, because hierarchical clustering uses the same data for training and prediction. It doesn't meet very well with current SupModel
trait.
Maybe we should have train_predict
method to use same data for training and prediction?
Awesome! Thank you so much for these PRs - sorry that it's taking me a little while to look through them properly as well. This is another algorithm I'm not super familiar with and so will need a little time to understand it.
There are definitely some issues with the current traits for learning. We've had discussions ( #124 ) about these before but it is hard to agree on the best. If you have anything to add I would love to hear your opinion.
As for this particular example, could you elaborate a little more on what exactly is lacking in the trait signature. From a brief look it seems that you train the model and return the clusters. The way I have handled this elsewhere is to use the UnSupModel
trait and store the clusters within the model, providing an access function. And then the predict function will assign the new points to their closest cluster. This is a little messy but keeps things consistent until we can find a nicer way to link models and traits together.
You're right. Renamed Metrics
to Linkage
.
Sorry I forgot to comment on this again.
Thanks for making those changes - I think that this is probably ready to be merged now. Before I do merge I'd like to have a play with it and convince myself it is working as expected. Do you have any recommendations on how I can do this? Any data sets that lend themselves to this kind of model?
Edit: If you have no other suggestions then I'll merge #161 and try it out on the iris data set.
iris is ok as test data. Here is R example( https://cran.r-project.org/web/packages/dendextend/vignettes/Cluster_Analysis.html). I'm adding tests once #161 has been merged.
Note: current test data is derived from a Japanese textbook in R.
Another point is training API. Currently, it doesn't impl UnSupModel
. What do you think to change UnSupModel
to:
pub trait UnSupModel<T, U> {
/// Predict output from inputs.
fn predict(&self, inputs: &T) -> LearningResult<U>;
/// Train the model using inputs.
fn train(&mut self, inputs: &T) -> LearningResult<()>;
/// Train the model using inputs, then predict outputs from the same inputs.
fn train_predict(&mut self, inputs: &T) -> LearningResult<U> {
self.train(inputs).unwrap();
self.predict(inputs)
}
AgglomerativeClustering
should raise unimplemented!
in predict/train
. And should have custom impl for train_predict
.