pylint
pylint copied to clipboard
Add a specifier for alerts that are linked to a library
Current problem
I'm trying to find a solution to specify an alert which is specific to a given library and version of that library.
An example is the pylint-ml plugin which has checkers associated with the Pandas library.
https://github.com/pylint-dev/pylint-ml/issues/23
Desired solution
When creating a checker, I would like to be able to add
- which library the code smell/alert is associated to
- which version of the lib ?below/exact/above? it is linked to
The question I have is how can we check which lib version the user is using or does this need to be configured manually? I am open to suggestions :)
Additional context
No response
Hi @PeterHamfelt perhaps this could be relevant to your question -
The question I have is how can we check which lib version the user is using or does this need to be configured manually? I am open to suggestions :)
>>> from importlib.metadata import version
>>> version("pandas")
'2.2.1'
>>>
https://docs.python.org/3/library/importlib.metadata.html
Thank you Marc ! I think we also need to specify how we handle the message declaration of the lib version in the checker (this is already done for python version we can build on it).
Example of use for the current minversion:
from pylint.checkers import BaseChecker
class AsyncChecker(BaseChecker):
msgs = {
"E1700": (
"Yield inside async function",
"yield-inside-async-function",
"Used when an `yield` or `yield from` statement is "
"found inside an async function.",
{"minversion": (3, 5)},
),
}
... But we must also declare what lib(s) is (are) affected. So possible data structure:
from pylint.checkers import BaseChecker
class PandasNumpyChecker(BaseChecker):
msgs = {
"E0042": (
"Some pandas/numpy version specific message",
"pandas-numpy-message",
"Some pandas/numpy version specific message rational",
{"minversion": (3, 5), "libversion": {
"pandas" : {"minversion": "1.0.1", "maxversion": "2.0.1"}, # version could be tuple
"numpy": {"minversion": "1.0.1}, #, but that's not what importlib.metadata.version returns
}},
),
}
- What we do when the lib is not installed (warn the user, or do nothing, I can see both being useful.) For example in pylint-django is django is not installed, then something went very wrong and a fatal error message could be raised !
from pylint.checkers import BaseChecker
class DjangoChecker(BaseChecker):
# ....
msgs = {
#...
"libversion": {
"django" : {"onmissing": "fatal"}, # if None nothing is done, if "fatal" a fatal error message is raised.
}
}
Let me know what you think about this spec proposal ;)
Looks good @Pierre-Sassoulas π
Hi @mbyrnepr2, @Pierre-Sassoulas ,
I apologize for the delayed response. I've reviewed the suggested setup for specifying library versions and I'm genuinely impressed. The structure you proposed:
"libversion": { "pandas": {"minversion": "1.0.1", "maxversion": "2.0.1"}, "numpy": {"minversion": "1.0.1"} }
This approach, especially considering versions as a range (with minimum and maximum bounds), seems both robust and flexible. I believe this could be particularly advantageous for pylint-ml. Given that each checker might depend on specific library versions, and these dependencies are subject to evolve over time, integrating this implementation would significantly enhance the analyzer's adaptability and accuracy.
I'm looking forward to discussing this further and exploring how we can incorporate this feature.
How do we handle non semver
versions in dependencies?
Good point, we must use string in the data structure not tuple and then handle the complexity internally. Other versioning schemes I know are:
- Date with (month and days ?) + character not in alpha/beta/rc/etc. (hash?)
2022.3.23f1
(unity). This looks almost like proper semver.. - Rounded up date 24.1a1) (black alpha versions), but the actual version have the months (https://github.com/psf/black/releases/tag/23.12.0)
- Improper semver 2.3.1, then 2.4, then 2.4.1(from antique colleague with a perl background, hi Benoit π <3)
We can't use use alphabetical ordering on the string because we need to detect that 9.9.9
is less than 10.0.0
, and 2022.9.23f1
is less than 2022.10.23f1
and 24.1a1
is less than 24.9.1
which is less than 24.10.1
. I guess it's split on the dot and then try to cast to int. If the versioning scheme does not have separation with dot, then we try to sort it alphabetically. If sorting it alphabetically does not result in something chronological then we don't support it ? Do such schemes exists and do they deserve pylint's checkers π ? (This part is probably going to be fun.)
I think we should utilize something like https://github.com/pypa/packaging.
On the one hand, yeah definitely. On the other hand I'd really hate to add a new dependency. Maybe we can live with supporting only semver or semver adjacent.
Maybe this doesn't address your concern Pierre but maybe it could be added as an extra dependency since users of Pylint itself wouldn't need it; whereas the plug-in dependency on Pylint would be pylint[packaging]. ( Similar to Celery in the way that if you need a redis task queue you would install as celery[redis].)
On the one hand, yeah definitely. On the other hand I'd really hate to add a new dependency. Maybe we can live with supporting only semver or semver adjacent.
If pylint does not support lib-version dependency, what would be the recommended approach to handle this locally for pylint plugins?
it could be added as an extra dependency
If we add it as an extra, we'll still have to deal with all the issue associated with a new dependency (for dill we had to track an issue for months, then Daniel had to create our own version of dill so we can release pylint in time for python 3.12). My main concern is the maintenance cost and the bloat. The complexity seems high even if we only take what we need and vendor it in pylint : https://github.com/pypa/packaging/blob/32deafe8668a2130a3366b98154914d188f3718e/src/packaging/version.py#L117-L146 / https://github.com/pypa/packaging/blob/32deafe8668a2130a3366b98154914d188f3718e/src/packaging/version.py#L200-L202
What do you mean by "lib-version dependency" @PeterHamfelt ?
I understand your concern @Pierre-Sassoulas. As this issue mainly adheres to pylint-ml, I'm interested in implementing simple version checks for dependencies, such as checking the pandas version which follows the "x.x.x" format (https://pandas.pydata.org/docs/whatsnew/index.html). Initially, at least.
Since such checks are not currently supported by pylint, what would be the recommended generic approach for incorporating this type of functionality into pylint plugins?
I'm interested in implementing simple version checks for dependencies, such as checking the pandas version which follows the "x.x.x" format
That would be semver, It's the most common versioning scheme, if we do the feature we're going to support this whatever we choose :)
Let's vote on what to do. Please let me know if you think of another solution, I'll add it to the vote :
- support semver only (i.e. : split on
.
cast to int, use the int tuple for ordering version, fail if anything unexpected happens), without any new dependency π - support semver and "close to semver" (i.e. : split on
.
, try to cast to int, use int tuple or string tuple or full string for ordering versions), without any new dependency π - vendor what pypa/packaging is doing to support everything but without dependency β€οΈ
- Add pypa/packaging as a dependency and support everything π
- semver only in pylint core by default, use dependency injection on the plugin side π
"Add pypa/packaging as a dependency and support everything π" In my opinion, in the best of worlds, this would always be the best choice. However, is it complex to implement? Worth it?
@jacobtylerwalls do you have an opinion on the subject ?
I'm for adding pypa/packaging. It looks stable, small footprint, and already supports python 3.13. dill is a much more sensitive subject area. I don't think we're taking on a big risk to add it.
Sorry for not stating that earlier, probably due to lack of sleep, but two reasons why I'm reluctant to add a dependency if it's possible to avoid it, are:
Dependencies are vastly more disruptive in pylint than in any other projects. pylint must be installed alongside a project's own dependencies to properly analyses a project. So it means pylint dependencies can conflict with the project's (i.e. we want pypa/packaging > 1
in a project requiring pypy/packaging < 1
and pylint then can't be installed at all. (For example, because of a clash with typing_extensions
(*) pylint was unusable on pytorch's projects for multiple months).
Also, one first step to increase the security of a package is to set the dependencies to exact versions (so that a new corrupted version of pypa/packaging
might not retro-actively compromise all the old version of pylint). It means we'd want to be able to require pypa/packaging
== 1.0.1 exactly and pylint would then clash with not only pypy/packaging < 1
but also with pypy/packaging > 1.01
. Pylint could get financial support from external organisms if we do this kind of thing, but in the past I said "not reasonable" (considering the point above, pylint would often become uninstallable), and they said " π° β. π ! ". Something to consider if we want to send a signal that we're interested by sponsoring from big actors.
So π / β€οΈ would be my personal choice.. anyway if the votes stays what it is with the new info, I'll disagree and commit :)
(*) in this case pytorch was requiring a specific old version, but this could happens more often if more maintainers choose to set dependencies to specific versions because of point 2
I suppose we could go forπ orπ initially and if future plugins need the added functionality of the dependency, it could be considered then (we could change our mind)?
Can we delegate all of this decision making to the plugin?
Nice π§ ! Yes, permitting to inject a comparison function on the plugin side and provide semver by default in pylint "core" would be a really nice design.
from pylint.checkers import BaseChecker
def _esoteric_lib_comparator(actual_version: str, min_version: str, max_version)-> bool:
return min_version < actual_version < max_version
class PandasSomeEsotericLibChecker(BaseChecker):
msgs = {
"E0042": (
"Some pandas/numpy version specific message",
"pandas-numpy-message",
"Some pandas/numpy version specific message rational",
{"minversion": (3, 5), "libversion": {
"pandas" : {
"minversion": "1.0.1",
"maxversion": "2.0.1",
"onmissing": "fatal",
"versioncomparator": None
},
"someesotericlib": {
"minversion": "duck",
"maxversion": "spoon",
"versioncomparator": _esoteric_lib_comparator
},
}},
),
}
not sure about "versioncomparator", sounds more like a check to see if the installed lib is in range. so "versionvalidator" ?
Nice π§ ! Yes, permitting to inject a comparison function on the plugin side and provide semver by default in pylint "core" would be a really nice design.
from pylint.checkers import BaseChecker def _esoteric_lib_comparator(actual_version: str, min_version: str, max_version)-> bool: return min_version < actual_version < max_version class PandasSomeEsotericLibChecker(BaseChecker): msgs = { "E0042": ( "Some pandas/numpy version specific message", "pandas-numpy-message", "Some pandas/numpy version specific message rational", {"minversion": (3, 5), "libversion": { "pandas" : { "minversion": "1.0.1", "maxversion": "2.0.1", "onmissing": "fatal", "versioncomparator": None }, "someesotericlib": { "minversion": "duck", "maxversion": "spoon", "versioncomparator": _esoteric_lib_comparator }, }}, ), }
not sure about "versioncomparator", sounds more like a check to see if the installed lib is in range. so "versionvalidator" ?
This looks good to me ππΌ
Do we have a decision? Starting with a simpler version handling (π / β€οΈ) and in the future aim to add pypa/packaging as a dependency? It would be great to have this in place when I start to add more and more checkers into pylint-ml π
Can versioncomparator
becomes something like Callable[[], bool] | None
where the callable should just do everything necessary to do the version comparison in the callable?
I'm not sure what you mean by that, could you upgrade the _esoteric_lib_comparator
example for it to be a Callable[[], bool] | None
? I suppose we must provide target min/max and actual version to the injected function (i.e. def _esoteric_lib_comparator(actual_version: str, min_version: str, max_version)-> bool:
?
Just make:
def _esoteric_lib_comparator() -> bool: ...
If we need to pass the version we need to have a way to get the version of a package for which we might do importlib
or packaging
magic. Let's just also offload that to the plugin?
Just make libversion
an object that we can iterate over and call whatever is under versioncomparator
?
Ha, right, I understand. Are there a lot of other ways to recover the version ? Maybe we could provide another injection point for that if that's the case:
import antigravity
from pylint.checkers import BaseChecker
def _esoteric_lib_version_comparator(actual_version: str, min_version: str, max_version)-> bool:
return min_version < actual_version < max_version
def _esoteric_lib_version_recoverer() -> str:
return antigravity.push("elephant")
class PandasSomeEsotericLibChecker(BaseChecker):
#...
"someesotericlib": {
"minversion": "duck",
"maxversion": "spoon",
"versioncomparator": _esoteric_lib_version_comparator,
"versionrecoverer": _esoteric_lib_version_recoverer,
},
That way you don't have to redefine the version recovering if you want to handle something else than semver. We'd have to implement the default way to do it in pylint so versionrecoverer
is optional and convenient to use.
If we need to pass the version we need to have a way to get the version of a package for which we might do importlib or packaging magic. Let's just also offload that to the plugin?
I think this is too paranoid. :D importlib
is in the stdlib, and Mark already showed a snippet I trust to get the version. Let's keep the API for this as small as we can.
I mean, the docs are full of warnings and caveats. I would personally prefer delegating all of this to plugins by making the API have no arguments at all. We can always add an argument later if plugins can't get the data they need themselves. Removing arguments later is much more of a hassle.
Sorry, I'm not grokking the caveats. Do you have one in that doc you might point me to?
"Specifically, it works with distributions with discoverable dist-info or egg-info directories, and metadata defined by the Core metadata specifications." "Important
These are not necessarily equivalent to or correspond 1:1 with the top-level import package names that can be imported inside Python code."
I'm mostly worried about an influx of issues about "my obscure poorly packaged python library doesn't work with pylint" etc. I don't really see why we need to pass so much to the plugins why they could call importlib
themselves as well. That also means that the plugin for a specific package is also responsible for determining whether that package is in accordance with all the packaging standards.