aeon [ENH] BroadcasterTransformer to broadcast a series transformer over a collection

Currently, BaseTransformers can handle both single series and collections by internally "Vectorize" (i.e. broadcast) across channels and instances depending on the input data. This is legacy code which to me very confusing, involves huge amounts of redundant checks and is not maintainable by the current team. We should imo move away from this model.

This is a quick mock up of my favoured solution, for comment. I've not tested it properly yet, putting it up before I go too far down the line. Basically have a BroadcastTransformer that we pass a BaseSeriesTransformer which is applied over all instances in a collection.

Main points

It takes all its tags from the series transformer. These should then be checked in the base class
It does not broadcast over channels. If the SeriesTransformer cannot handle multivariate, nor can the Broadcaster.
If fit is empty is true in the SeriesTransformer, then Broadcaster only uses a single instance of the SeriesTransformer. If fit is empty is false, it clones it n_cases times and stores the transformers
it calls the private fit/transform of the SeriesTransformer. The assumption is that if it passes checks in the BaseCollectionTransformer, and the tags are taken from SeriesTransformer, then this should be fine. I think this should be true. If its not, we should make it so.

Feb 24 '24 16:02 TonyBagnall

Thank you for contributing to `aeon`

I have added the following labels to this PR based on the title: [ $\color{#FEF1BE}{\textsf{enhancement}}$ ]. I would have added the following labels to this PR based on the changes made: [ $\color{#41A8F6}{\textsf{transformations}}$ ], however some package labels are already present.

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Slack channel if you have any.

Feb 24 '24 16:02 aeon-actions-bot[bot]

A drawback of our current approach is that a transformer that needs to be fitted could only be broadcasted to a set of samples from the same size. For example if we had X1.shape = (5, 1, 100) used for fit and then transform, a set X2.shape = (10, 1, 100) could not be transformed, as we stored the 5 transformer fitted on X1. But I don't know if this would ever happen considering we are broadcasting a single series transformer, which by essence (if fit is not empty) fits and transform one series.

Mar 06 '24 14:03 baraline

Here is a draft for using joblib, need to add some tests and sort out tags for input format, as for now the example using MatrixProfileSeriesTransformer fails because it didn't convert the series to univariate as expected

Mar 06 '24 14:03 baraline

To-dos left for this PR before it is ready for review:

[x] Fix broadcaster test, seem to be failing some check in cases where fit_empty=True in the broadcasted series transformer
[x] Add new broadcaster transformer to the docs
[x] Add an example notebook on how to use the broadcaster for different use cases
[x] Add type hints when not a union of lot of possible input types

Mar 11 '24 16:03 baraline

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Mar 12 '24 08:03 review-notebook-app[bot]

included the coverage tag, just to make sure most of it is happening correctly

Mar 22 '24 11:03 MatthewMiddlehurst

Thanks for reviewing ! Concerning your questions:

So is the essentially intended to be the parallel of https://github.com/aeon-toolkit/aeon/blob/main/aeon/transformations/collection/_collection_wrapper.py?

Isn't _collection_wrapper.py a Collection to Series while the introduction of this PR is a Series to Collection ? (In the Collection to Series cases, there is no need for parallel I think ?)

I think we should attempt to unify design a bit for these, the collection version currently does not use the series transformer base class. Don't have to change the collection version in this PR of course. I prefer CollectionToSeriesWrapper or similar to the current name.

I agree, both Collection to Series and Series to Collection transformers should be close to each other. So It would be SeriesToCollectionWrapper and CollectionToSeriesWrapper (for _collection_wrapper.py, already implemented)

Mar 22 '24 11:03 baraline

Some notes on the updates :

Removed axis from Matrix Profile as it is expecting univariate data
Added data generation for 1D/2D series in testing utils
Added tests for BaseSeriesTransformer
Removed axis as input parameter for Dummies, this parameter should stay fixed as it represent the expected shape for the implemented methods
Modified dummy to affect random value per channel to test axis conversion
Renamed Broadcaster to SeriesToCollectionWrapper
Return the transformed data to the shape of the specified axis parameter in transform

Mar 23 '24 13:03 baraline

A drawback of our current approach is that a transformer that needs to be fitted could only be broadcasted to a set of samples from the same size. For example if we had X1.shape = (5, 1, 100) used for fit and then transform, a set X2.shape = (10, 1, 100) could not be transformed, as we stored the 5 transformer fitted on X1. But I don't know if this would ever happen considering we are broadcasting a single series transformer, which by essence (if fit is not empty) fits and transform one series.

I think if the transformer has a _fit function, then it needs to have been called on any case passed. I think calling transform on its own is not allowed, since is_fitted will be false

Mar 28 '24 08:03 TonyBagnall

taking another look here, I'll take a look at tags. Firstly though, I'm not sure about the name series_wrapper. To me at least, a wrapper contains a single object, whereas this broadcasts. Same with class name SeriesToCollectionWrapper, it seems to be doing to much in the name. Wrappers to me are simply containers for another object. This comment has " "Wrap a single series transformer over a collection." Which doesnt read properly to me. I prefer """Broadcast a single series transformer over a collection."

Mar 28 '24 09:03 TonyBagnall

looking at the tags, I think the only ones we dont want to take from the series transformer are

_tags = { "input_data_type": "Collection", "output_data_type": "Collection", "X_inner_type": "numpy3D", # or whatever the subclass sets it to "capability:multithreading": False,# or whatever the subclass sets it to

the other tags set in BaseCollectionTransformer are

# tag values specific to CollectionTransformers
_tags = {
    "input_data_type": "Collection",
    "output_data_type": "Collection",
    "fit_is_empty": False,
    "requires_y": False,
    "capability:inverse_transform": False,
}

and in BaseCollectionEstimator are _tags = { "capability:multivariate": False, "capability:unequal_length": False, "capability:missing_values": False, "X_inner_type": "numpy3D", "capability:multithreading": False, "python_version": None, # PEP 440 python version specifier to limit versions } I think all of these can be taken from the SeriesTransformer, it if has them. I cant think of a case when we would want them different in the collection. Except multithreading? Not looked at that yet. I would prefer to exclude these than maintain a list of _tags_to_inherit, which seems fragile to me. If a series transformer has a tag specific to them, there is no need for the Broadcaster to know about it. It does need to know the tags it insists on though

Mar 28 '24 09:03 TonyBagnall

also, we enforce calling fit by setting is fitted to true in the base class fit not the constructor

Mar 28 '24 09:03 TonyBagnall

so I think we have some issues with threading about cloning/not cloning and the old problem of where to set n_jobs. I propose that we get it in without threading first, then raise an issue and describe the problems @baraline @MatthewMiddlehurst

Mar 28 '24 10:03 TonyBagnall

Agreed, then we can benchmark it to select the best solution, I'll try to make the modification tomorrow if you didn't do it before me !

Mar 28 '24 10:03 baraline

I'll push some changes, but they are open for discussion

Mar 28 '24 15:03 TonyBagnall

there is in fact a problem calling _fit, since the X_inner_type of collections and series are fundamentally different, will unfortunately have to call base class fit

Mar 28 '24 19:03 TonyBagnall

replaced with #1632

Jun 09 '24 16:06 TonyBagnall

aeon aeon copied to clipboard

[ENH] BroadcasterTransformer to broadcast a series transformer over a collection

Thank you for contributing to aeon

aeon
aeon copied to clipboard

Thank you for contributing to `aeon`