aeon icon indicating copy to clipboard operation
aeon copied to clipboard

[DOC] Hierarchical, spectral, or density-based clustering using sklearn and aeon distance metrics

Open SebastianSchmidl opened this issue 1 year ago • 1 comments

Describe the issue linked to the documentation

The clustering component in aeon currently supports only partition-based methods. However, there are also hierarchical, spectral, and density-based clustering methods [1].

Suggest a potential alternative/fix

Using the distance metrics in aeon, we can pre-compute the distance matrix for traditional clustering methods. Some methods are already implemented in sklearn, which is a core dependency of eaon and, thus, available to users. I think we should at least link to the sklearn-clusterers in the documentation. With a bit more effort, we could provide examples on how to use sklearn's clusterers with aeon's distance measures (here).

I did not yet test this approach.

[1]: Paparrizos, John, and Luis Gravano. "Fast and Accurate Time-Series Clustering." ACM Transactions on Database Systems 42, no. 2 (2017): 8:1-8:49. https://doi.org/10.1145/3044711.

SebastianSchmidl avatar Feb 24 '24 14:02 SebastianSchmidl

thanks for this, we have some examples I think of using precomputed with scikit, but if its not clear it would be great if it was clearer. I would like to get density peaks in, iirc we have a java implementation.

TonyBagnall avatar Feb 27 '24 13:02 TonyBagnall

Hey, Can i work on this issue?

SalmanDeveloperz avatar Jan 03 '25 16:01 SalmanDeveloperz

Yes, sure. @aeon-actions-bot assign @SalmanDeveloperz

SebastianSchmidl avatar Jan 04 '25 14:01 SebastianSchmidl

Hey, I’m working on this issue and appreciate your guidance on a few points:-

  1. Where should I add the example? Should it go in an existing documentation file (if so, which one), or should I create a new file in the docs/ directory?
  2. Are there any specific datasets or clustering algorithms you would like me to include in the examples (e.g., Agglomerative, Spectral Clustering)?
  3. Is there a preferred format for the documentation (e.g., .md) or specific style guidelines I should follow?
  4. Should I include the example code in a separate script or keep it embedded within the documentation file?

Once I have clarification, I’ll proceed with the implementation and submit a PR. Thank you for your guidance!

SalmanDeveloperz avatar Jan 18 '25 17:01 SalmanDeveloperz

Please carefully read our Developer Guide and look at the existing documentation.

  1. The examples are hosted at https://www.aeon-toolkit.org/en/latest/examples.html and the sources are here.
  2. This issue already lists the required sklearn estimators. You can skip DensityPeaks because there is a separate issue for it, and it does not exist in sklearn.
  3. The examples are Jupyter Notebooks, see developer guide and use the existing notebooks as guidance.
  4. This should be clear from the above.

SebastianSchmidl avatar Jan 19 '25 20:01 SebastianSchmidl

Hey @SebastianSchmidl, How can I test my changes in notebook locally before submitting a PR? Thanks

SalmanDeveloperz avatar Jan 22 '25 11:01 SalmanDeveloperz

There is a documentation guide for building the docs locally on the website. For the notebooks themselves where are lots of methods for running them, but I imagine most IDEs (i.e. PyCharm) will support it.

MatthewMiddlehurst avatar Jan 22 '25 14:01 MatthewMiddlehurst