aeon [DOC] Hierarchical, spectral, or density-based clustering using sklearn and aeon distance metrics

Describe the issue linked to the documentation

The clustering component in aeon currently supports only partition-based methods. However, there are also hierarchical, spectral, and density-based clustering methods [1].

Suggest a potential alternative/fix

Using the distance metrics in aeon, we can pre-compute the distance matrix for traditional clustering methods. Some methods are already implemented in sklearn, which is a core dependency of eaon and, thus, available to users. I think we should at least link to the sklearn-clusterers in the documentation. With a bit more effort, we could provide examples on how to use sklearn's clusterers with aeon's distance measures (here).

hierarchical clustering:
- sklearn.cluster.AgglomerativeClustering with metric="precomputed"
density-based clustering:
- sklearn.cluster.DBSCAN with metric="precomputed"
- sklearn.cluster.OPTICS with metric="precomputed"
- Density Peaks? https://doi.org/10.1007/s10115-018-1189-7
spectral clustering:
- sklearn.cluster.SpectralClustering with affinity="precomputed" and the inverse of the distance matrix (large values indicate greater similarity)

I did not yet test this approach.

[1]: Paparrizos, John, and Luis Gravano. "Fast and Accurate Time-Series Clustering." ACM Transactions on Database Systems 42, no. 2 (2017): 8:1-8:49. https://doi.org/10.1145/3044711.

Feb 24 '24 14:02 SebastianSchmidl

thanks for this, we have some examples I think of using precomputed with scikit, but if its not clear it would be great if it was clearer. I would like to get density peaks in, iirc we have a java implementation.

Feb 27 '24 13:02 TonyBagnall

Hey, Can i work on this issue?

Jan 03 '25 16:01 SalmanDeveloperz

Yes, sure. @aeon-actions-bot assign @SalmanDeveloperz

Jan 04 '25 14:01 SebastianSchmidl

Hey, I’m working on this issue and appreciate your guidance on a few points:-

Where should I add the example? Should it go in an existing documentation file (if so, which one), or should I create a new file in the docs/ directory?
Are there any specific datasets or clustering algorithms you would like me to include in the examples (e.g., Agglomerative, Spectral Clustering)?
Is there a preferred format for the documentation (e.g., .md) or specific style guidelines I should follow?
Should I include the example code in a separate script or keep it embedded within the documentation file?

Once I have clarification, I’ll proceed with the implementation and submit a PR. Thank you for your guidance!

Jan 18 '25 17:01 SalmanDeveloperz

Please carefully read our Developer Guide and look at the existing documentation.

The examples are hosted at https://www.aeon-toolkit.org/en/latest/examples.html and the sources are here.
- [ ] You would need to create a new notebook for using aeon distances with sklearn clusterers and put it in the Clustering-section
- [ ] Change the reference to the new notebook in the clustering overview
- [ ] The overview notebook already links to Using aeon distances with scikit-learn, but the clustering part is too brief and lacks details. Add a link to the new notebook in the Clustering-with-sklearn.cluster section, too.
This issue already lists the required sklearn estimators. You can skip DensityPeaks because there is a separate issue for it, and it does not exist in sklearn.
The examples are Jupyter Notebooks, see developer guide and use the existing notebooks as guidance.
This should be clear from the above.

Jan 19 '25 20:01 SebastianSchmidl

Hey @SebastianSchmidl, How can I test my changes in notebook locally before submitting a PR? Thanks

Jan 22 '25 11:01 SalmanDeveloperz

There is a documentation guide for building the docs locally on the website. For the notebooks themselves where are lots of methods for running them, but I imagine most IDEs (i.e. PyCharm) will support it.

Jan 22 '25 14:01 MatthewMiddlehurst

aeon aeon copied to clipboard

[DOC] Hierarchical, spectral, or density-based clustering using sklearn and aeon distance metrics

Describe the issue linked to the documentation

Suggest a potential alternative/fix

aeon
aeon copied to clipboard