aeon
aeon copied to clipboard
[DOC] Hierarchical, spectral, or density-based clustering using sklearn and aeon distance metrics
Describe the issue linked to the documentation
The clustering component in aeon currently supports only partition-based methods. However, there are also hierarchical, spectral, and density-based clustering methods [1].
Suggest a potential alternative/fix
Using the distance metrics in aeon, we can pre-compute the distance matrix for traditional clustering methods. Some methods are already implemented in sklearn, which is a core dependency of eaon and, thus, available to users. I think we should at least link to the sklearn-clusterers in the documentation. With a bit more effort, we could provide examples on how to use sklearn's clusterers with aeon's distance measures (here).
- hierarchical clustering:
-
sklearn.cluster.AgglomerativeClustering
withmetric="precomputed"
-
- density-based clustering:
-
sklearn.cluster.DBSCAN
withmetric="precomputed"
-
sklearn.cluster.OPTICS
withmetric="precomputed"
- Density Peaks? https://doi.org/10.1007/s10115-018-1189-7
-
- spectral clustering:
-
sklearn.cluster.SpectralClustering
withaffinity="precomputed"
and the inverse of the distance matrix (large values indicate greater similarity)
-
I did not yet test this approach.
[1]: Paparrizos, John, and Luis Gravano. "Fast and Accurate Time-Series Clustering." ACM Transactions on Database Systems 42, no. 2 (2017): 8:1-8:49. https://doi.org/10.1145/3044711.
thanks for this, we have some examples I think of using precomputed with scikit, but if its not clear it would be great if it was clearer. I would like to get density peaks in, iirc we have a java implementation.
Hey, Can i work on this issue?
Yes, sure. @aeon-actions-bot assign @SalmanDeveloperz
Hey, I’m working on this issue and appreciate your guidance on a few points:-
- Where should I add the example? Should it go in an existing documentation file (if so, which one), or should I create a new file in the docs/ directory?
- Are there any specific datasets or clustering algorithms you would like me to include in the examples (e.g., Agglomerative, Spectral Clustering)?
- Is there a preferred format for the documentation (e.g., .md) or specific style guidelines I should follow?
- Should I include the example code in a separate script or keep it embedded within the documentation file?
Once I have clarification, I’ll proceed with the implementation and submit a PR. Thank you for your guidance!
Please carefully read our Developer Guide and look at the existing documentation.
- The examples are hosted at https://www.aeon-toolkit.org/en/latest/examples.html and the sources are here.
- [ ] You would need to create a new notebook for using aeon distances with sklearn clusterers and put it in the Clustering-section
- [ ] Change the reference to the new notebook in the clustering overview
- [ ] The overview notebook already links to Using aeon distances with scikit-learn, but the clustering part is too brief and lacks details. Add a link to the new notebook in the Clustering-with-sklearn.cluster section, too.
- This issue already lists the required sklearn estimators. You can skip
DensityPeaks
because there is a separate issue for it, and it does not exist in sklearn. - The examples are Jupyter Notebooks, see developer guide and use the existing notebooks as guidance.
- This should be clear from the above.
Hey @SebastianSchmidl, How can I test my changes in notebook locally before submitting a PR? Thanks
There is a documentation guide for building the docs locally on the website. For the notebooks themselves where are lots of methods for running them, but I imagine most IDEs (i.e. PyCharm) will support it.