ExplainaBoard
ExplainaBoard copied to clipboard
Use 'confidence' instead of deprecated 'alpha' for scipy.stats.t.interval
Reduced heavy logging uncovered buried DeprecationWarnings in tests. We get the following DeprecationWarning in the tests that invoke scipy.stats.t.interval
method:
test_hits (explainaboard.tests.test_metric.TestMetric) ... /home/runner/work/ExplainaBoard/ExplainaBoard/explainaboard/metrics/metric.py:338: DeprecationWarning: Use of keyword argument `alpha` for method `interval` is deprecated. Use first positional argument or keyword argument `confidence` instead.
This PR fixes the warning as the warning suggests.
It seems better not to fix this DeprecationWarning
since the warning is generated since scipy 1.9.0 (latest version). Before 1.9.0, alhpa
was used. This can be a breaking change unless we set the minimum version of scipy to support. Currently, setup.py
doesn't specify the minimum version of scipy.
This is a hard decision, I'm not sure if we want to limit ourselves to only being compatible with the most recent version of scikit-learn
. But also DeprecationWarnings would be good to get rid of as well.
@tetsuok @neubig At least we need to fix a version of libraries we rely on. Setting deprecation warnings is a common process to remove a feature from the library, but it usually takes 2-3 minor releases to remove the feature completely. It shouldn't be a problem if we set appropriate versions.
That's a good point @odashi .
I'm a bit conflicted about this though. If we completely fix the version of libraries that we rely on (a ==
dependency), then this can cause the libraries to get more and more out of date. Because of this I've preferred specifying a minimum version (a >=
dependency). We've also been a bit lenient on the version number (allowing slightly older versions) to reduce the number of conflicts that ExplainaBoard causes with other libraries that may be installed in the environment.
I don't have the best answer, but I think we should carefully choose a policy and implement it.
>=
indicates that we require the newest version (unless other restrictions, such as <=
), and it should be used only when we know that every new versions never break public interfaces or such change doesn't affect our library.
According to semvar that many libraries follow, I think the recommended policy is:
- use
~=N.x
or~=N.x.y
if the library has a major version (N>=1
). It automatically increases both minor/patch numbers (~=N.x
) or only the patch number (~=N.x.y
) that may not involve breaking changes. - use
==
if the library is beta (0.x.y
) since increasing the version may have any kind of changes.
Updating versions of depended libraries needs explicit maintenance (it basically can not be done automatically) so it is our responsibility.
According to the SciPy's log (https://github.com/scipy/scipy/pull/16389) it attempts to introduce breaking changes in the minor version (1.11.0
). So we need to fix minor versions too for this library (~=1.N.0
)
Thanks @odashi !
I agree that we should avoid breaking anything, but at the same time ==
or ~=
dependencies may result in our code requiring out of date libraries. This is problematic for two reasons:
- The older libraries may be slower, less secure, or less featureful.
- Specifying a specific library version (especially an older one) can make it harder to find a set of libraries that doesn't have any conflicts with other libraries in the environment.
Because of these issues, I've tended to go with >=
dependencies and then fix errors when they occur, but I agree that this is not the best for library stability. Do you have a good method to overcome the issues of libraries going out of date while using ==
or ~=
dependencies? If one exists I'd be happy to go with that.
@neubig The ~=N.x
specifier (fixing the major version, auto-update minor/patch versions) does never cause any breakages on libraries that follows semantic versioning. Others are case-by-case (no silver-bullet solution): we have to investigate how each library is managed to determine appropriate version specifiers.
The best solution here is to choose a version specifier that won't involve breaking changes. Fixing the major version is enough for semvar-based libraries. We realized that the minor version should also be fixed for SciPy as discussed above.
Security updates are basically applied by increasing a weaker version number unless it requires overall refactoring, so ~=
specifier should cover most cases of security updates.
There are also GitHub-official bot to detect serious security issues. Since the bot covers most of major libraries, following alerts generated by this bot can prevent most of known issues.
For efficiency/features, we need to treat these kind of updates as a usual feature request to this library.
OK, I agree with this strategy. I guess we should refactor the setup.py/requirements files appropriately then. I'll create an issue.
Blocked by #472
This is ready for review because #563 pinned the version of scipy using the '~=' operator.