megalinter icon indicating copy to clipboard operation
megalinter copied to clipboard

Pylint should not look to import modules by default

Open schorlton opened this issue 1 year ago • 8 comments

Upgraded to v6 and now getting lots of import-errors from pylint. It looks like a config was added for v6?

Suggest defaulting to disable=import-error in the config as it shouldn't be expected to run megalinter in an env with all python modules. Eg. I'm running using the official megalinter docker image (v6.1.0). Not sure why I wasn't hitting this before.

Thanks!

schorlton avatar Jul 28 '22 06:07 schorlton

Can you elaborate on why import-error should be disabled by default for all users as opposed to your own project? I haven't encountered issues with this rule personally.

Kurt-von-Laven avatar Jul 28 '22 19:07 Kurt-von-Laven

If you import any external library, these need to be installed into python env of the docker image before running megalinter. So either I need to build my own docker image to lint based on megalinter with all of my libs (and keep it up to date), or run a lengthy install process every time I run megalinter using the official image to first install all of the deps. Neither of which seem great options for running with Docker?

schorlton avatar Jul 29 '22 06:07 schorlton

Thank you for clarifying; I get where you are coming from now. MegaLinter mounts the project directory in the Docker image, so it seems like it should be possible to reuse the same virtualenv you use for local development? We use a similar technique in a context that is unrelated to MegaLinter where we configure Poetry to locate the virtualenv in the project directory and prepend .venv/bin to the PATH so that the virtualenv's Python will be used. Pipenv locates the virtualenv within the project directory by default as I recall. In the context of MegaLinter, you should be able to make use of the PRE_COMMANDS config option to modify the PATH. An entirely different strategy might be to mount your package manager's cache within the Docker image. I realize that neither of these approaches are exactly a frictionless out-of-the-box experience, but I'm also uncertain how common of a problem long Python dependency install times are, and in general I hesitate to nudge people towards shallower dependency-blind linting since I assume that limits the issues Pylint can catch. Anyone whose dependencies take too long to install and doesn't wish to prioritize improving the performance of that process can of course disable Pylint's import-error rule. In the long run, the very real pain point you have raised (as well as the need to maintain a fleet of Docker images and the fact that any given project may not exactly match one of our flavors) might be eliminated if MegaLinter operated in an on-the-fly manner akin to npx, but I don't anticipate having the bandwidth to open that can of worms any time soon unfortunately, and it would presumably sacrifice the isolation benefits of containerization.

Kurt-von-Laven avatar Jul 29 '22 11:07 Kurt-von-Laven

Mounting Python packages from your local directory to the Docker container could be a problem if the Python package has a specific installation for an OS/platform. For example, you are running on Windows, and the Python package compiles a native interface, or interacts with a native application, this might mess things up. The docker container runs on Linux.

echoix avatar Jul 29 '22 12:07 echoix

Thank you both for your thoughtful responses! I agree with @echoix - another example is we currently run all of our code in containers, with dependencies baked into the image...we have no way to get those Python modules easily mounted by Megalinter. This also seems like a regression as v5 did not have this issue. Any way, I can just disable the import rule for now. Interestingly, it seems like pylint is not running with default config as I tried creating a blank .pylintrc just disabling this rule, but that triggered a whole bunch of other errors which must be configured in the megalinter .pylintrc.

@Kurt-von-Laven, that's an interesting idea for the dynamic runtime. I'm currently hitting an issue where GitLab pulling the image takes over an hour...but I think I'll open a separate thread for that.

Thanks again!

schorlton avatar Jul 29 '22 13:07 schorlton

Ah, that is a good point, @echoix. In that specific case, I suggest WSL (Windows Subsystem for Linux), but in general I don't see why it wouldn't be possible to leverage PRE_COMMANDS to install dependencies from within the Docker container to a directory (e.g., the project directory) that will be mounted by subsequent Docker containers.

we currently run all of our code in containers, with dependencies baked into the image...we have no way to get those Python modules easily mounted by Megalinter

It seems like it should be possible to maintain a simple shell script that runs a few package manager commands (e.g., poetry install) that is reused by your Dockerfile(s) and called from PRE_COMMANDS by MegaLinter. In other words, it sounds like a matter that can be addressed by appropriate decomposition.

Maybe @nvuillam has thoughts on what may have changed in v6.

Sounds good; happy to discuss in a separate thread. Short answer: cache the Docker image (a la docker-cache for GitHub Actions). Longer answer: it sounds like GitLab has terrible pull times from Docker Hub. GitHub Actions pulls much faster than that and also offers GitHub Container Registry with still faster pull times. Maybe GitLab offers some analogue to GHCR or, worst case, may want you to pay them more money in exchange for faster performance. It sounds like another issue that would be at least partially ameliorated by moving towards an on-the-fly approach.

Kurt-von-Laven avatar Jul 29 '22 23:07 Kurt-von-Laven

In v6, default config file name has changed from .python-lint to .pylintrc , is it possible it had an impact ?

(pylint's .pylintrc is the official default file name , wherease .python-lint was an inheritance from Super Linter, such issue has been reported several times )

nvuillam avatar Aug 09 '22 05:08 nvuillam

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

If you think this issue should stay open, please remove the O: stale 🤖 label or comment on the issue.

github-actions[bot] avatar Sep 09 '22 01:09 github-actions[bot]

Reopening this...I still think that this rule should be disabled by default. It should be possible to lint python code without installing dependencies into the megalinter image...I have no way to do this as I have hundreds of conda dependencies. It seems like you even do this for your own repo: https://github.com/oxsecurity/megalinter/issues/1866#issuecomment-1244149615

Thanks for your consideration!

schorlton avatar Oct 03 '22 20:10 schorlton

@schorlton why not just defining a local .pylintrc in your repo to disable import error issue ?

nvuillam avatar Oct 13 '22 05:10 nvuillam

why not just defining a local .pylintrc in your repo to disable import error issue?

I want to use check import errors locally with Pylint, but at the same time, use minimal configuration for MegaLinter, so I don't want to disable Pylint in MegaLinter nor I want to have two .pylintrc files.

Checking for the imports by default makes sense, but, given the common usage through an image in GitHub Actions, checking for imports in an environment which is not expected to have them is not expected behavior.

While applying the Pylint default rules often makes sense at least for simple Python projects, non-standard imports are common even for simple projects, so the current behavior requires to always customize the configuration even for these simple projects. (...but perhaps small projects which don't need much custom configuration are not the main target of MegaLinter and that's a fair answer.)

wenzeslaus avatar Nov 09 '22 17:11 wenzeslaus

@schorlton @wenzeslaus MegaLinter wants to address any kind of projet in any language :)

About imports... if you want to let the rule active, maybe you could install python dependencies with a PRE_COMMAND ?

https://github.com/oxsecurity/megalinter/issues/2017#issuecomment-1298771836

nvuillam avatar Nov 09 '22 17:11 nvuillam

Thanks, I was not aware of PRE_COMMANDS. It may get involved for me, but seems like the way to go for smaller number of pure-Python dependencies which will cover at least the small Python project cases.

wenzeslaus avatar Nov 09 '22 18:11 wenzeslaus

@nvuillam @Kurt-von-Laven Please consider https://github.com/oxsecurity/megalinter/issues/1665#issuecomment-1265971531: installing the dependencies is very costly in some cases. This is a design problem: when Mega-Linter is a separate CI/CD job, and a release CI/CD job exists that's gated on Mega-Linter success, given that the release job is a container build, then installing the dependencies is required for both the release job and the Mega-Linter job. That's inefficient. One could split the release job up, mount directories for the container build, and share a cache, but that's pretty involved isn't it?

sanmai-NL avatar May 11 '23 12:05 sanmai-NL

@sanmai-NL we are in open-source, anything is possible :)

To avoid to install twice the dependencies I could suggest to use POST_COMMANDS for the release, but those commands probably deserve to be in their own repo.

Another thing you could do could be having another pylintrc with import-error disabled, that would be used only within MegaLinter thanks to a variable PYTHON_PYLINT_CONFIG_FILE defined in the CI job env variables.

Ex: PYTHON_PYLINT_CONFIG_FILE=.pylintrc-ci

nvuillam avatar May 11 '23 18:05 nvuillam

My bias is to only disable checks by default that are annoying/unhelpful/problematic/wrong in a significant majority of cases since I expect extra checks to rarely be explicitly enabled relative to the rate at which irrelevant checks are disabled.

This is a design problem: when [MegaLinter] is a separate CI/CD job, and a release CI/CD job exists that's gated on [MegaLinter] success, given that the release job is a container build, then installing the dependencies is required for both the release job and the [MegaLinter] job. That's inefficient. One could split the release job up, mount directories for the container build, and share a cache, but that's pretty involved isn't it?

I might be missing something, but how is this different from running any sort of containerized test or linting in CI? What you are describing is essentially my base presumption of the functioning of a well-designed CI/CD pipeline that uses containerization. Personally, I find that containerization is often more hassle than it's worth, but obviously I feel the benefits of MegaLinter greatly outweigh the hassles of Docker. We are able to avoid the hassle of mounting dependencies by installing them in the project directory (c.f., https://github.com/oxsecurity/megalinter/issues/1665#issuecomment-1199193372) and running MegaLinter on Linux so that we aren't installing our dependencies for the wrong platform (c.f., https://github.com/oxsecurity/megalinter/issues/1665#issuecomment-1199198343).

Kurt-von-Laven avatar May 11 '23 23:05 Kurt-von-Laven

A possible solution would be to let MegaLinter report its results with full DevOps platform integration, even if it is called embedded within a CI job, rather than from a dedicated MegaLinter CI job. The downside would be that the linting/MegaLinter activity's success would be conflated with the embedding activity's success.

P.S.: I'm not claiming this integration is not on par currently, I'm actively exploring that and will report back a solution. But the current tutorials/docs advise a dedicated MegaLinter CI job, which doesn't fit this standard of a ‘functioning of a well-designed CI/CD pipeline’, that @Kurt-von-Laven argues.

Side note: @Kurt-von-Laven The About field on https://github.com/oxsecurity/megalinter calls this product Mega-Linter with a dash.

sanmai-NL avatar May 12 '23 06:05 sanmai-NL

@nvuillam I was only talking about the design problem, I'm not implying you should fix it yourself. ;)

Another thing you could do could be having another pylintrc with import-error disabled, that would be used only within MegaLinter thanks to a variable PYTHON_PYLINT_CONFIG_FILE defined in the CI job env variables.

Ex: PYTHON_PYLINT_CONFIG_FILE=.pylintrc-ci

The problem with that is that it requires a second invocation of PyLint. The second one would cover critical checks you'd want to perform earlier than CI already. If you instead run those checks another time, you're again doing extra work. Potentially non-trivial work, since PyLint is or can be pretty slow.

sanmai-NL avatar May 12 '23 06:05 sanmai-NL

If someone feels up to contributing better integration between MegaLinter and Azure DevOps (or any other CI for that matter), I imagine this would be something many people would benefit from. I'm not sure I see the connection to what I was saying earlier; we may be miscommunicating about that, but we welcome your proposal. Please feel free to file a new issue with your ideas.

Thanks; Mega-Linter was the original name. I updated the dangling reference you spotted.

Keep an eye on charliermarsh/ruff#970 since Ruff reaching feature parity with Pylint sounds like it would benefit you considering how insignificant the cost of running Ruff twice is.

Kurt-von-Laven avatar May 13 '23 09:05 Kurt-von-Laven

Yes, we also use Ruff but the design problem is more general of course.

A CI job is also a general concept and the problem is not related to integration with CI services. A CI job generally is an isolated workload that's a segment of a CI pipeline. The problem is in the isolation.

sanmai-NL avatar May 13 '23 15:05 sanmai-NL

I am also running into this issue. The current implementation feels off. My current project only has two pip dependencies, so I could handle using the PYTHON_PYLINT_PRE_COMMANDS, but then I have to repeat this multiple times for different python linters. Is there a one liner where I can install deps for all python linters?

Even if that one liner exists I still agree with op as there are other projects I work one with lots of pip dependencies and it starts to feel quite onerous to have to install all those deps again for each linter.

daltonv avatar Jul 04 '23 06:07 daltonv

@daltonv you can use PRE_COMMANDS , that is run before all linters :)

nvuillam avatar Jul 06 '23 09:07 nvuillam

@nvuillam don't I have to specify the venv of the linter in the pre-command list though? I still would require mulitple repeats of the same pip install in different venvs.

daltonv avatar Jul 06 '23 11:07 daltonv

Hmmmmm indeed I did not think about that :/

@Kurt-von-Laven , dear python master... maybe you have a proposition ?

nvuillam avatar Jul 06 '23 12:07 nvuillam

I have done this and it works great for DRY:

x_pip_install: &pip_install pip install --no-cache-dir -r /tmp/requirements.txt

## Commands:
PRE_COMMANDS:
  - command: | # if you use `Pipfile`
      pip install --no-cache-dir 'pipenv~=2023.11'
      pipenv requirements > /tmp/requirements.txt
  - command: *pip_install
    venv: pylint
  - command: *pip_install
    venv: pyright

goetzc avatar Jan 18 '24 18:01 goetzc