Give feedback when `svm.SVC` is configured with kernel hyperparameters for a different kernel
During our class we noticed some students incorrectly configure some hyperparameters which are irrelevant to the kernel used, for example setting gamma when using a linear kernel. We think it could make sense for scikit-learn to give feedback to the user when non-effective settings are configured.
Describe the workflow you want to enable
from sklearn.datasets import load_iris
from sklearn.svm import SVC
x, y = load_iris(return_X_y=True)
clf = SVC(kernel='linear', gamma=1e-6)
clf.fit(x, y)
print(clf.score(x, y))
current output:
0.9933333333333333
proposed output, something similar to:
UserWarning: Gamma is set but not used because a linear kernel is configured.
0.9933333333333333
Thanks for the report, I agree we should even error in this case (instead of a warning), as bug fix. The same goes for coef0 and for SVR.
PR welcome @PGijsbers
Currently, SVC ignores parameters depending on the kernel. For invalid combinations, I agree with @NicolasHug that we should error instead.
I think we should error when the parameter is not the default value and not compatible with the kernel. For example, if gamma!='scale' and kernel not in {'rbf', 'poly', 'sigmoid'}, we raise an error. This logic can extend to degree and coef0.
I'll try to make the changes this weekend 👍
Should the error be raised on fit similar to incompatible configurations for LogisticRegression, or on __init__?
the validation should happen in fit (more details here if you're interested https://scikit-learn.org/stable/developers/develop.html#instantiation)
Also please make sure to write a non-regression test for the cases that are worth testing. The test should make sure that the error is now properly raised. Your snippet above is a great candidate.
Thanks!
This change breaks some tests, e.g. sparse_svm. I'll go ahead and update those? Should I raise a DeprecationWarning first, or immediately setup a PR with ValueErrors?
I would prefer to deprecated first to be on the safe side. For the first PR, let's do this for gamma first to see how other reviewers feel about it. Then we can have follow up PRs on the other parameters.
I don't think there's anything to deprecate here: there's no feature, just a silent (minor) bug.
We could raise a temporary warning instead of an error, but for such bugfixes we tend to error directly. On top of that, the upcoming release is 1.0, so it'd be nice to include such changes directly.
Also, the failing tests as mentioned above would have to fixed whether we raise a warning or an error.
I'll update the PR when there is a new consensus (the PR has a FutureWarning instead of DeprecationWarning right now) :) just let me know
@thomasjpfan what made you change your mind from error to warning? The offending test sparse_svm is clearly wrong as it passes clf = svm.OneClassSVM(gamma=1, kernel=kernel) for all kernels.
I agree with Nicolas: the test is just being lazy.
The ignoring behavior is explicitly documented which makes me think it was intentional.
I do want to get to raising an error at some point and would be +1 on raising an error for 1.0.
I do want to get to raising an error at some point and would be +1 on raising an error for 1.0.
Why not just go through the usual deprecation cycle? This is more user friendly than a breaking change.
The fact that the degree parameter is documented as ignored explicitly if kernel is not poly is a marker that the current behavior was intentional and should therefore not be considered a bug. Also the current behavior can be (ab-)used to do simple yet efficient hyper-parameter search for all the kernel and there parametrizations at once using RandomizedSearchCV with distributions. Maybe that's an edge feature that few uses and the educational value of the warning (then error in the future) is more important.
Also, to be consistent, we should do the same for all 3 parameters (gamma, degree, coef0).
Also we should also issue the FutureWarning for SVC(gamma="auto", kernel="linear") because default is gamma="scale".
Since gamma="scale" is the default, it means that calling SVC(gamma="scale", kernel="linear") explicitly will not raise which is a bit weird/surprising. We could use None as a default marker for gamma, degree and coef0 but it's a bit sad because then it's not longer possible to see what is the meaning of the default value just by reading the prototype of the function. One would have to read the parameters section of the docstring instead. No strong opinion on that last point.
Raising an error will not be an issue with SerchCV thanks to the error_score parameter.
Since gamma="scale" is the default, it means that calling SVC(gamma="scale", kernel="linear") explicitly will not raise which is a bit weird/surprising. We could use None as a default marker for gamma, degree and coef0 but it's a bit sad because then it's not longer possible to see what is the meaning of the default value just by reading the prototype of the function.
I assume that we want to be consistent. But indeed getting gamma=None that will default to gamma="scale' if kernel="rbf" is semantically weird. I would expect the default to be "auto" meaning that it would default to a value (that is never None). However, for gamma, "auto" is even meaning something else.
So I assume that we are left with:
- let the code as it is at the cost of users trying a non-meaningful set of hyperparameters if they are not experts and look at the documentation
- change the code with obscure default values where you need to read the documentation to know what will be the real default
I am +0 on this. @ogrisel @NicolasHug @NicolasHug could you give a bit more thoughts and say what you think is best? Depending on the direction, my review process will be different in the PR :)
Why not just go through the usual deprecation cycle? This is more user friendly than a breaking change.
Looking at this again, if we were to change behavior, I am +1 on deprecation.
We could use None as a default marker for gamma, degree and coef0 but it's a bit sad because then it's not longer possible to see what is the meaning of the default value just by reading the prototype of the function.
Maybe that is a good thing? Currently, one needs to read the docs to know if the parameter is even active.
Also the current behavior can be (ab-)used to do simple yet efficient hyper-parameter search for all the kernel and there parametrizations at once using RandomizedSearchCV with distributions.
I think the current implementation is less efficient, because one can have search spaces with parameters that are ignored.
For example, if we had a search space: kernel=['linear', 'poly', 'rbf', 'sigmoid'] x degree = [2, 3, 4, 5, 6], all the non-poly kernels will be training the same model. With error_score, the invalid combinations will early exit.
I am +0.5 on deprecating the current behavior.
So the deprecation could be a good option.
@thomasjpfan what are your thoughts regarding the gamma parameter:
I assume that we want to be consistent. But indeed getting gamma=None that will default to gamma="scale' if kernel="rbf" is semantically weird. I would expect the default to be "auto" meaning that it would default to a value (that is never None). However, for gamma, "auto" is even meaning something else.
Using 'auto' to mean 1/n_feature is kind of weird in itself. To move forward we can also rename 'auto' to reciprocal_n_features ?
Hi, I wanted to work on this issue, but I wanted to verify if my understanding was correct Understanding: Description: In machine learning, Linear Kernels are used as a simple way to separate data using a single line. The equation for Linear Kernels is K(x, z) = x · z. The gamma parameter defines how far the influence of a single training example reaches. Since the linear kernel does not create complex decision boundaries, adjusting gamma does not affect the model's behavior or the decision boundary, to avoid confusion, we want to point that out to users. Current Behavior: When configuring SVC with a linear kernel and setting gamma, Scikit-Learn silently ignores the gamma parameter without any warning or feedback.
Proposed Understanding and Solution: To enhance user experience and prevent misconfigurations, it would be beneficial for Scikit-Learn to provide feedback when irrelevant hyperparameters are set for a given kernel. Specifically, issuing a UserWarning when parameters like gamma, coef0, and degree are configured but not applicable to the selected kernel can guide users toward better practices.
Proposed Behavior: When irrelevant hyperparameters like gamma are set for a linear kernel, Scikit-Learn should issue a UserWarning to inform the user that the parameter is being ignored.
Proposed Output: UserWarning: The 'gamma' parameter has been set but is not relevant for the 'linear' kernel. It will be ignored. 0.9933333333333333
Proposed Code Changes: In the fit method of the SVC class, add checks for irrelevant hyperparameters and issue warnings accordingly.
Questions: Is my understanding correct that parameters like gamma, coef0, and degree are irrelevant when using a linear kernel in SVC? Is raising UserWarning the most appropriate method for notifying users about irrelevant hyperparameters, or should Scikit-Learn consider raising errors instead? Would handling parameter validation in the init method be more effective for catching issues earlier, or is fit the appropriate place? What are the best practices for writing non-regression tests to ensure these warnings are correctly triggered without affecting other functionalities?
FYI: I'm removing the "Easy" label to see if it helps maintain the quality of incoming PRs.