scikit-learn
scikit-learn copied to clipboard
ENH Add Additional Losses to MLPRegressor
This issue adds additional losses of MAE and Logcosh to the MLPRegressor.
Yes, yes, I know that Torch/Tensorflow/etc are the place where 'real' neural networks happen, but as I work mostly in time series, here MLPs often outperform ever other neural network approach. I have seen this myself comparing the methods in the NeuralForecast package, as well as papers released by major teams like Google. And the nice thing about this scikit-learn implementation is that is just works everywhere, and is preinstalled, everywhere, so can be useful to have available as an option. However, on time series data, which tend to be quite noisy, other losses functions than the default squared error tend to do better, so it is nice to have these extra options.
Functions added in these PR are very simple and not expected to add any additional maintenance as they are basic math functions. Most everything was already in place that this PR needed, and I attempted to conform to existing style and documentation as best as I could.
Tests, documentation, and style all considered and included :white_check_mark:
Let me know what else you would like done to make this merge ready!
✔️ Linting Passed
All linting checks passed. Your pull request is in excellent shape! ☀️
Had to add loss to MLPClassifer to make the tests work, but it only has the base log_loss and binary_log_loss options. Looks like all tests are passing except the changelog
Ready for review, only the changelog update is needed, all other tests are passing.
Thanks a lot for this PR and putting in the effort to make it nicely formatted and tested. It sounds like you know that people are not super keen to add new functionality to the MLP classes.
I think a good next step is to see if there are people who want to actively object, after that we can start reviewing the work.
cc @scikit-learn/core-devs
Thanks for the contribution! I have also found that in a lot of cases, being able to quickly deploy a small MLP w/ scikit-learn is helpful (I used it a ton as baseline infrastructure when comparing more technical methods in research), and indeed alternative losses are important for many use cases.
Despite that, in general I am personally opposed to expanding our MLP side. I personally think it's a good candidate for "soft deprecation" where we just let it be, and promise to fix it if it breaks, but not include it for consideration for new features. There are myriad opinionated implementation details for MLPs that make them slightly more suited for different contexts, and I just don't want to invite that level of specificity. Deep learning, in general, has that trend of implementation slippery-slope.
Overall, between the concrete utility added by this PR vs the long-term role of MLP in scikit-learn, I'm -0.5 on this. If anyone feels differently and wants to advocate in favor of these changes, I won't object.
Thank you for your contribution, @winedarksea.
While I appreciate the efforts, I am also -1 for the reasons given by @Micky774 unless there's a significant demand for loss functions.
I‘m not -1 on adding losses to MLP. Just if we do so, we should use the losses that we already have in our private _loss module. They are already well tested and wouldn’t incur additional maintenance burden for MLP.
I am not that worried about maintaining the implementation, I am more worried about extending and continuing to maintain a public API which has been considered by the maintenance team to be on track for deprecation by introducing another parameter (as proposed by loss here). :/
But I won't strongly object to approvals of this PR.
However, I am not against a Cython implementation of the Log-Cosh which might be valuable for many implementations.
I was expecting these arguments on the big picture, and agree with them.
But I urge consideration of the small picture: you've got all this code already here, and this little change actually has a big improvement on accuracy, driving value for years to come. And the maintenance expectation is again, basically 0, it's np.log, np.logcosh, and np.add, not functions that are likely I think to see breaking changes ever (probably, hopefully).
And I am the only one (I believe) in years, who has attempted a push to this, so I don't think there is much risk of opening the floodgates to more. Come yell at me later if I am wrong!
I think merging this would still meet with the expectations of this being a "softly" deprecated function.
(I could try that cython log-cosh implementation if that is sufficient carrot)
I‘m -1 on Log-Cosh loss because we have it nowhere else (and because I think it’s not a consistent scoring functions, neither for the expectation nor for the median).
I‘m -1 on Log-Cosh loss because we have it nowhere else (and because I think it’s not a consistent scoring functions, neither for the expectation nor for the median).
I would be fine with that, as just MAE is the main improvement (for the noisy data).
(actually a quantile loss is the best I've seen, but I didn't attempt to add that as it would be a much larger change)
But I urge consideration of the small picture: you've got all this code already here, and this little change actually has a big improvement on accuracy
This is true.
...driving value for years to come.
This, however, is a bit more dubious. I understand the sentiment, but we cannot really predict what value folks will derive from this. We know it will benefit your use case, but even if other folks could benefit from it, it doesn't mean that they will. MLP is not very popular in scikit-learn, which means all the value added is less likely to be discovered, let alone actually used and translated into real gain.
And the maintenance expectation is again, basically 0, it's np.log, np.logcosh, and np.add, not functions that are likely I think to see breaking changes ever (probably, hopefully).
You are right about the literal code maintenance, but that is not the maintenance we are protecting against. The code is simple -- nearly trivial, really (that's good). The real danger is in the expectations this opens up. We greatly value consistency and predictability within scikit-learn. To accept this PR would signal several things, including the following:
- We are open to new features in MLP (general consensus is against this)
- We are open to the
log-coshelsewhere (not yet discussed, definitely not agreed upon) - We are open to extending MLP API (we did so for
loss, so why not for<insert your creative MLP feature here>?)
And I am the only one (I believe) in years, who has attempted a push to this, so I don't think there is much risk of opening the floodgates to more. Come yell at me later if I am wrong!
Now, you are right, we have not had many other folks push this in particular, but we very regularly have tons of issues, comments, PRs, etc. about small modifications that we do need to turn down. Everybody has something in the ML world they'd love to see represented in scikit-learn. Even just the signal that the MLP side of things is open to development would invite many folks with ideas ranging from reasonable to impressively niche. While we appreciate the contributions, we simply cannot accept all of them (or even most of them). Every "no" from us comes with a lot of consideration and discussion, and that is hours taken from other tasks. Our greatest scarcity is in maintainer time.
The maintenance weight is not of code repair, nor of code review, but rather of conversation, consideration, and (often) rejection. These do take a lot of time and energy. It may seem aggressively conservative (and maybe frustratingly so) to be averse to including proper, well-written code in fear of these indirect consequences, but it's much more of a social/political burden than an actual code burden when it comes to the maintenance we are trying to avoid.
Edit: Even if we drop log-cosh, the other points stand.
P.S.
Regardless of the decision on this particular PR, I do hope we see you again as a contributor for any other part you are interested in or find relevant to your practice. We value folks with experience, and especially folks that understand the real-world problems that our tools are meant to deal with. This conversation is, in my opinion, valuable and hopefully does not come across as a negative thing :)
So I guess the conclusion is that we're closing this as out of scope? Please comment if it's otherwise.