pylint icon indicating copy to clipboard operation
pylint copied to clipboard

Add a specifier for alerts that are linked to a library

Open PeterHamfelt opened this issue 3 months ago β€’ 36 comments

Current problem

I'm trying to find a solution to specify an alert which is specific to a given library and version of that library.
An example is the pylint-ml plugin which has checkers associated with the Pandas library. https://github.com/pylint-dev/pylint-ml/issues/23

Desired solution

When creating a checker, I would like to be able to add

  • which library the code smell/alert is associated to
  • which version of the lib ?below/exact/above? it is linked to

The question I have is how can we check which lib version the user is using or does this need to be configured manually? I am open to suggestions :)

Additional context

No response

PeterHamfelt avatar Apr 04 '24 18:04 PeterHamfelt

Hi @PeterHamfelt perhaps this could be relevant to your question -

The question I have is how can we check which lib version the user is using or does this need to be configured manually? I am open to suggestions :)

>>> from importlib.metadata import version
>>> version("pandas")
'2.2.1'
>>> 

https://docs.python.org/3/library/importlib.metadata.html

mbyrnepr2 avatar Apr 05 '24 11:04 mbyrnepr2

Thank you Marc ! I think we also need to specify how we handle the message declaration of the lib version in the checker (this is already done for python version we can build on it).

Example of use for the current minversion:

from pylint.checkers import BaseChecker

class AsyncChecker(BaseChecker):
    msgs = {
        "E1700": (
            "Yield inside async function",
            "yield-inside-async-function",
            "Used when an `yield` or `yield from` statement is "
            "found inside an async function.",
            {"minversion": (3, 5)},
        ),
    }

... But we must also declare what lib(s) is (are) affected. So possible data structure:

from pylint.checkers import BaseChecker

class PandasNumpyChecker(BaseChecker):
    msgs = {
        "E0042": (
            "Some pandas/numpy version specific message",
            "pandas-numpy-message",
            "Some pandas/numpy version specific message rational",
            {"minversion": (3, 5), "libversion": {
                "pandas" : {"minversion": "1.0.1", "maxversion": "2.0.1"}, # version could be tuple
                "numpy":  {"minversion": "1.0.1}, #, but that's not what importlib.metadata.version returns
            }},
        ),
    }
  • What we do when the lib is not installed (warn the user, or do nothing, I can see both being useful.) For example in pylint-django is django is not installed, then something went very wrong and a fatal error message could be raised !
from pylint.checkers import BaseChecker

class DjangoChecker(BaseChecker):
   # ....
   msgs = {
       #...
       "libversion": {
           "django" : {"onmissing": "fatal"}, # if None nothing is done, if "fatal" a fatal error message is raised.
       }
   }

Let me know what you think about this spec proposal ;)

Pierre-Sassoulas avatar Apr 05 '24 22:04 Pierre-Sassoulas

Looks good @Pierre-Sassoulas πŸ‘Œ

mbyrnepr2 avatar Apr 06 '24 05:04 mbyrnepr2

Hi @mbyrnepr2, @Pierre-Sassoulas ,

I apologize for the delayed response. I've reviewed the suggested setup for specifying library versions and I'm genuinely impressed. The structure you proposed:

"libversion": { "pandas": {"minversion": "1.0.1", "maxversion": "2.0.1"}, "numpy": {"minversion": "1.0.1"} }

This approach, especially considering versions as a range (with minimum and maximum bounds), seems both robust and flexible. I believe this could be particularly advantageous for pylint-ml. Given that each checker might depend on specific library versions, and these dependencies are subject to evolve over time, integrating this implementation would significantly enhance the analyzer's adaptability and accuracy.

I'm looking forward to discussing this further and exploring how we can incorporate this feature.

PeterHamfelt avatar Apr 09 '24 13:04 PeterHamfelt

How do we handle non semver versions in dependencies?

DanielNoord avatar Apr 09 '24 14:04 DanielNoord

Good point, we must use string in the data structure not tuple and then handle the complexity internally. Other versioning schemes I know are:

  • Date with (month and days ?) + character not in alpha/beta/rc/etc. (hash?) 2022.3.23f1 (unity). This looks almost like proper semver..
  • Rounded up date 24.1a1) (black alpha versions), but the actual version have the months (https://github.com/psf/black/releases/tag/23.12.0)
  • Improper semver 2.3.1, then 2.4, then 2.4.1(from antique colleague with a perl background, hi Benoit πŸ‘‹ <3)

We can't use use alphabetical ordering on the string because we need to detect that 9.9.9 is less than 10.0.0, and 2022.9.23f1 is less than 2022.10.23f1 and 24.1a1 is less than 24.9.1 which is less than 24.10.1. I guess it's split on the dot and then try to cast to int. If the versioning scheme does not have separation with dot, then we try to sort it alphabetically. If sorting it alphabetically does not result in something chronological then we don't support it ? Do such schemes exists and do they deserve pylint's checkers πŸ˜„ ? (This part is probably going to be fun.)

Pierre-Sassoulas avatar Apr 09 '24 15:04 Pierre-Sassoulas

I think we should utilize something like https://github.com/pypa/packaging.

DanielNoord avatar Apr 09 '24 15:04 DanielNoord

On the one hand, yeah definitely. On the other hand I'd really hate to add a new dependency. Maybe we can live with supporting only semver or semver adjacent.

Pierre-Sassoulas avatar Apr 09 '24 15:04 Pierre-Sassoulas

Maybe this doesn't address your concern Pierre but maybe it could be added as an extra dependency since users of Pylint itself wouldn't need it; whereas the plug-in dependency on Pylint would be pylint[packaging]. ( Similar to Celery in the way that if you need a redis task queue you would install as celery[redis].)

mbyrnepr2 avatar Apr 09 '24 20:04 mbyrnepr2

On the one hand, yeah definitely. On the other hand I'd really hate to add a new dependency. Maybe we can live with supporting only semver or semver adjacent.

If pylint does not support lib-version dependency, what would be the recommended approach to handle this locally for pylint plugins?

PeterHamfelt avatar Apr 10 '24 06:04 PeterHamfelt

it could be added as an extra dependency

If we add it as an extra, we'll still have to deal with all the issue associated with a new dependency (for dill we had to track an issue for months, then Daniel had to create our own version of dill so we can release pylint in time for python 3.12). My main concern is the maintenance cost and the bloat. The complexity seems high even if we only take what we need and vendor it in pylint : https://github.com/pypa/packaging/blob/32deafe8668a2130a3366b98154914d188f3718e/src/packaging/version.py#L117-L146 / https://github.com/pypa/packaging/blob/32deafe8668a2130a3366b98154914d188f3718e/src/packaging/version.py#L200-L202

What do you mean by "lib-version dependency" @PeterHamfelt ?

Pierre-Sassoulas avatar Apr 11 '24 11:04 Pierre-Sassoulas

I understand your concern @Pierre-Sassoulas. As this issue mainly adheres to pylint-ml, I'm interested in implementing simple version checks for dependencies, such as checking the pandas version which follows the "x.x.x" format (https://pandas.pydata.org/docs/whatsnew/index.html). Initially, at least.

Since such checks are not currently supported by pylint, what would be the recommended generic approach for incorporating this type of functionality into pylint plugins?

PeterHamfelt avatar Apr 12 '24 08:04 PeterHamfelt

I'm interested in implementing simple version checks for dependencies, such as checking the pandas version which follows the "x.x.x" format

That would be semver, It's the most common versioning scheme, if we do the feature we're going to support this whatever we choose :)

Let's vote on what to do. Please let me know if you think of another solution, I'll add it to the vote :

  • support semver only (i.e. : split on . cast to int, use the int tuple for ordering version, fail if anything unexpected happens), without any new dependency πŸ‘€
  • support semver and "close to semver" (i.e. : split on ., try to cast to int, use int tuple or string tuple or full string for ordering versions), without any new dependency πŸ˜„
  • vendor what pypa/packaging is doing to support everything but without dependency ❀️
  • Add pypa/packaging as a dependency and support everything πŸš€
  • semver only in pylint core by default, use dependency injection on the plugin side πŸ‘

Pierre-Sassoulas avatar Apr 13 '24 20:04 Pierre-Sassoulas

"Add pypa/packaging as a dependency and support everything πŸš€" In my opinion, in the best of worlds, this would always be the best choice. However, is it complex to implement? Worth it?

PeterHamfelt avatar Apr 15 '24 09:04 PeterHamfelt

@jacobtylerwalls do you have an opinion on the subject ?

Pierre-Sassoulas avatar Apr 15 '24 09:04 Pierre-Sassoulas

I'm for adding pypa/packaging. It looks stable, small footprint, and already supports python 3.13. dill is a much more sensitive subject area. I don't think we're taking on a big risk to add it.

jacobtylerwalls avatar Apr 15 '24 12:04 jacobtylerwalls

Sorry for not stating that earlier, probably due to lack of sleep, but two reasons why I'm reluctant to add a dependency if it's possible to avoid it, are:

Dependencies are vastly more disruptive in pylint than in any other projects. pylint must be installed alongside a project's own dependencies to properly analyses a project. So it means pylint dependencies can conflict with the project's (i.e. we want pypa/packaging > 1 in a project requiring pypy/packaging < 1 and pylint then can't be installed at all. (For example, because of a clash with typing_extensions (*) pylint was unusable on pytorch's projects for multiple months).

Also, one first step to increase the security of a package is to set the dependencies to exact versions (so that a new corrupted version of pypa/packaging might not retro-actively compromise all the old version of pylint). It means we'd want to be able to require pypa/packaging == 1.0.1 exactly and pylint would then clash with not only pypy/packaging < 1 but also with pypy/packaging > 1.01. Pylint could get financial support from external organisms if we do this kind of thing, but in the past I said "not reasonable" (considering the point above, pylint would often become uninstallable), and they said " πŸ’° ❌. πŸ‘‹ ! ". Something to consider if we want to send a signal that we're interested by sponsoring from big actors.

So πŸ˜„ / ❀️ would be my personal choice.. anyway if the votes stays what it is with the new info, I'll disagree and commit :)

(*) in this case pytorch was requiring a specific old version, but this could happens more often if more maintainers choose to set dependencies to specific versions because of point 2

Pierre-Sassoulas avatar Apr 15 '24 13:04 Pierre-Sassoulas

I suppose we could go forπŸ˜„ orπŸ‘€ initially and if future plugins need the added functionality of the dependency, it could be considered then (we could change our mind)?

mbyrnepr2 avatar Apr 15 '24 13:04 mbyrnepr2

Can we delegate all of this decision making to the plugin?

jacobtylerwalls avatar Apr 15 '24 13:04 jacobtylerwalls

Nice 🧠 ! Yes, permitting to inject a comparison function on the plugin side and provide semver by default in pylint "core" would be a really nice design.

from pylint.checkers import BaseChecker

def _esoteric_lib_comparator(actual_version: str, min_version: str, max_version)-> bool: 
    return min_version < actual_version < max_version

class PandasSomeEsotericLibChecker(BaseChecker):
    msgs = {
        "E0042": (
            "Some pandas/numpy version specific message",
            "pandas-numpy-message",
            "Some pandas/numpy version specific message rational",
            {"minversion": (3, 5), "libversion": {
                "pandas" : {
                    "minversion": "1.0.1",
                    "maxversion": "2.0.1",
                    "onmissing": "fatal",
                    "versioncomparator": None
                 }, 
                "someesotericlib":  {
                    "minversion": "duck", 
                    "maxversion": "spoon",
                    "versioncomparator": _esoteric_lib_comparator
                },
            }},
        ),
    }

not sure about "versioncomparator", sounds more like a check to see if the installed lib is in range. so "versionvalidator" ?

Pierre-Sassoulas avatar Apr 15 '24 13:04 Pierre-Sassoulas

Nice 🧠 ! Yes, permitting to inject a comparison function on the plugin side and provide semver by default in pylint "core" would be a really nice design.

from pylint.checkers import BaseChecker

def _esoteric_lib_comparator(actual_version: str, min_version: str, max_version)-> bool: 
    return min_version < actual_version < max_version

class PandasSomeEsotericLibChecker(BaseChecker):
    msgs = {
        "E0042": (
            "Some pandas/numpy version specific message",
            "pandas-numpy-message",
            "Some pandas/numpy version specific message rational",
            {"minversion": (3, 5), "libversion": {
                "pandas" : {
                    "minversion": "1.0.1",
                    "maxversion": "2.0.1",
                    "onmissing": "fatal",
                    "versioncomparator": None
                 }, 
                "someesotericlib":  {
                    "minversion": "duck", 
                    "maxversion": "spoon",
                    "versioncomparator": _esoteric_lib_comparator
                },
            }},
        ),
    }

not sure about "versioncomparator", sounds more like a check to see if the installed lib is in range. so "versionvalidator" ?

This looks good to me πŸ˜ƒπŸΌ

PeterHamfelt avatar Apr 15 '24 19:04 PeterHamfelt

Do we have a decision? Starting with a simpler version handling (πŸ˜„ / ❀️) and in the future aim to add pypa/packaging as a dependency? It would be great to have this in place when I start to add more and more checkers into pylint-ml πŸ˜ƒ

PeterHamfelt avatar Apr 17 '24 08:04 PeterHamfelt

Can versioncomparator becomes something like Callable[[], bool] | None where the callable should just do everything necessary to do the version comparison in the callable?

DanielNoord avatar Apr 17 '24 08:04 DanielNoord

I'm not sure what you mean by that, could you upgrade the _esoteric_lib_comparator example for it to be a Callable[[], bool] | None ? I suppose we must provide target min/max and actual version to the injected function (i.e. def _esoteric_lib_comparator(actual_version: str, min_version: str, max_version)-> bool: ?

Pierre-Sassoulas avatar Apr 17 '24 09:04 Pierre-Sassoulas

Just make:

def _esoteric_lib_comparator() -> bool: ...

If we need to pass the version we need to have a way to get the version of a package for which we might do importlib or packaging magic. Let's just also offload that to the plugin? Just make libversion an object that we can iterate over and call whatever is under versioncomparator?

DanielNoord avatar Apr 17 '24 09:04 DanielNoord

Ha, right, I understand. Are there a lot of other ways to recover the version ? Maybe we could provide another injection point for that if that's the case:

import antigravity
from pylint.checkers import BaseChecker

def _esoteric_lib_version_comparator(actual_version: str, min_version: str, max_version)-> bool: 
    return min_version < actual_version < max_version

def _esoteric_lib_version_recoverer() -> str: 
    return antigravity.push("elephant")

class PandasSomeEsotericLibChecker(BaseChecker):
               #...
                "someesotericlib":  {
                    "minversion": "duck", 
                    "maxversion": "spoon",
                    "versioncomparator": _esoteric_lib_version_comparator,
                    "versionrecoverer": _esoteric_lib_version_recoverer,
                },

That way you don't have to redefine the version recovering if you want to handle something else than semver. We'd have to implement the default way to do it in pylint so versionrecoverer is optional and convenient to use.

Pierre-Sassoulas avatar Apr 17 '24 11:04 Pierre-Sassoulas

If we need to pass the version we need to have a way to get the version of a package for which we might do importlib or packaging magic. Let's just also offload that to the plugin?

I think this is too paranoid. :D importlib is in the stdlib, and Mark already showed a snippet I trust to get the version. Let's keep the API for this as small as we can.

jacobtylerwalls avatar Apr 17 '24 11:04 jacobtylerwalls

I mean, the docs are full of warnings and caveats. I would personally prefer delegating all of this to plugins by making the API have no arguments at all. We can always add an argument later if plugins can't get the data they need themselves. Removing arguments later is much more of a hassle.

DanielNoord avatar Apr 17 '24 12:04 DanielNoord

Sorry, I'm not grokking the caveats. Do you have one in that doc you might point me to?

jacobtylerwalls avatar Apr 17 '24 13:04 jacobtylerwalls

"Specifically, it works with distributions with discoverable dist-info or egg-info directories, and metadata defined by the Core metadata specifications." "Important

These are not necessarily equivalent to or correspond 1:1 with the top-level import package names that can be imported inside Python code."

I'm mostly worried about an influx of issues about "my obscure poorly packaged python library doesn't work with pylint" etc. I don't really see why we need to pass so much to the plugins why they could call importlib themselves as well. That also means that the plugin for a specific package is also responsible for determining whether that package is in accordance with all the packaging standards.

DanielNoord avatar Apr 17 '24 14:04 DanielNoord