aeon [MNT] Update similarity search with new base classes : Query Search

Reference Issues/PRs

Part of #1243

What does this implement/fix? Explain your changes.

As described in #1243, this is a first PR that implement base classes for query search task, and transfers the existing code under this new submodule.

Remaining TODOs:

[ ] Adapt distance profile function for unequal length
[ ] Add tests for unequal length for both dummy and top-k
[ ] Fix missing docstrings
[ ] Add typing to function/class parameters

PR checklist

For new estimators and functions

[X] I've added the estimator to the online API documentation.
[X] (OPTIONAL) I've added myself as a __maintainer__ at the top of relevant files and want to be contacted regarding its maintenance. Unmaintained files may be removed. This is for the full file, and you should not add yourself if you are just making minor changes or do not want to help maintain its contents.

May 09 '24 14:05 baraline

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

May 09 '24 14:05 review-notebook-app[bot]

Thank you for contributing to `aeon`

I have added the following labels to this PR based on the title: [ $\color{#EC843A}{\textsf{maintenance}}$ ]. I have added the following labels to this PR based on the changes made: [ $\color{#45FD64}{\textsf{examples}}$, $\color{#006b75}{\textsf{similarity search}}$ ]. Feel free to change these if they do not properly represent the PR.

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Slack channel if you have any.

May 09 '24 14:05 aeon-actions-bot[bot]

Any reason for the new base class direction? Do they other base classes not have much shared functionality?

Jun 09 '24 15:06 MatthewMiddlehurst

Any reason for the new base class direction? Do they other base classes not have much shared functionality?

Yeah, after doing some pen and paper for the other two classes, the base class for similarity search would be pretty much empty. For example, index search will actually do things differently during fit as it need to build a similarity model, while series-search in the most naive way (without computational optimisations like MP) is simply looping over a query search for all possible candidates. The computational optimisations do require some rethinking of the fit also compared to query search. So if we consider the three submodule, at least for now, I don't see a use for BaseSimilaritySearch class. Still possible to refactor afterward if we find a good reason to.

Jun 09 '24 15:06 baraline

On second thought, we could call the preprocess function of the CollectionEstimator for the 3D data given during fit method in a BaseSimilaritySearch, but that's about it ... Would that be better for structure’s sake's ?

Jun 09 '24 15:06 baraline

Couple of things to consider but ofc can change later as you say.

If there are a significant number of shared parameters/attributes/functions used then it may be a good idea to keep even if its mostly abstract methods.

Also there may be some situations where you want to use isinstance to cover all of them? Maybe not also, not thought about it that much 🙂.

Jun 09 '24 16:06 MatthewMiddlehurst

If there are a significant number of shared parameters/attributes/functions used then it may be a good idea to keep even if its mostly abstract methods.

For some reason, I forgot to consider this ... I'm a bit out of touch today ! Swapping structure to add it back with the adjustment.

Jun 09 '24 17:06 baraline

Noticed some issue with the base class structure for the case of #1311, where the optimisation relies on lower bounding and not returning the distance profile fully computed, so I'll revamp it to be useable for all type of optimisations.

Jun 14 '24 14:06 baraline

Sorry for the mess in this PR ... To summarize :

The previous BaseQuerySearch class was made to allow any matching condition on the output of the similarity search (e.g. top k matches, all matches below a distance threshold, ...). In practice, as there are only few plausible conditions: top k and/or threshold and worse-k and/or threshold, I just made a QuerySearch estimator with k, threshold and inverse_distance parameter that cover all these cases.

Additionally, there was the problem of computational optimisations that only compute part of the distance profiles (e.g. dtw lower bounding). These optimisations need to know the matching condition (top-k, ...) to work. The previous structure would not have allowed that, as distance profile computations were happening in the BaseQuerySearch class, which didn't have access to the matching condition.

Jun 16 '24 18:06 baraline

aeon aeon copied to clipboard

[MNT] Update similarity search with new base classes : Query Search

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Remaining TODOs:

PR checklist

For new estimators and functions

Thank you for contributing to aeon

aeon
aeon copied to clipboard

Thank you for contributing to `aeon`