aeon icon indicating copy to clipboard operation
aeon copied to clipboard

[ENH] make fit_predict_default configurable

Open TonyBagnall opened this issue 9 months ago • 0 comments

Describe the feature or idea you want to propose

currently fit_predict makes estimates on train data by default through cross validation. It hard codes the number of folds to 10 or the minimum number of cases in one class . I would like be able to set this to something other than 10, not immediately sure the best way of configuring this.

It also always fits the whole model. I'd like to be able to turn that off.

The context is using fit_predict to score channels for channel selection. Would like it to be fast, so want 3x CV and not to build the whole model

Describe your proposed solution

mocked up fit.

        n_channels = X.shape[1]
        scores=np.zeros(n_channels)
        # Evaluate each channel with the classifier
        for i in range(n_channels):
            preds=self.classifier.fit_predict(X[:,i,:],y)
            scores[i]=accuracy_score(y,preds)
        # Select the top n_keep channels
        sorted_indices = np.argsort(-scores)
        n_keep = math.ceil(n_channels * self.proportion)
        self.channels_selected_=sorted_indices[:n_keep]

Currently this builds 11 models per channel, assuming each class has at least 10 cases

    def _fit_predict_default(self, X, y, method):
        # fit the classifier
        self._fit(X, y)

        # predict using cross-validation
        cv_size = 10
        _, counts = np.unique(y, return_counts=True)
        min_class = np.min(counts)
        if min_class < cv_size:
            cv_size = min_class
            if cv_size < 2:
                raise ValueError(
                    f"All classes must have at least 2 values to run the "
                    f"_fit_{method} cross-validation."
                )

        random_state = getattr(self, "random_state", None)
        estimator = _clone_estimator(self, random_state)

        return cross_val_predict(
            estimator,
            X=X,
            y=y,
            cv=cv_size,
            method=method,
            n_jobs=self._n_jobs,
        )

could do it with kwargs for fit_predict maybe?

        for i in range(n_channels):
            preds=self.classifier.fit_predict(X[:,i,:],y, **{"cv_size":3,"full_model":False})

Describe alternatives you've considered, if relevant

could I set it in the constructor or pass as an explicit parameter with default 10

TonyBagnall avatar May 08 '24 13:05 TonyBagnall