pylint
pylint copied to clipboard
Slow with all checks disabled using pandas + dataclass
Bug description
Repro:
-
mkdir test && cd test -
python3 -m venv venv -
venv/bin/pip install pylint pandas -
Write
foo.py:from dataclasses import dataclass from pandas import DataFrame @dataclass class Foo: a: DataFrame -
time pylint --disable=all foo.py
And this consistently takes 8s to run on my machine. Doing any of the following brings the runtime down to <2s:
- Comment out the
dataclassesimport - Comment out the
pandasimport - Comment out the
@dataclassdecorator - Use
@fooinstead of@dataclass - Use
@dataclasses.dataclassinstead of@dataclass - Use
intinstead ofpandas.Series - Use
Optional[DataFrame]instead ofDataFrame - Uninstalling
pandasfrom the environment
Configuration
No response
Command used
time pylint --disable=all foo.py
Pylint output
--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
--disable=all foo.py 7.93s user 0.28s system 111% cpu 7.387 total
Expected behavior
Disabling all checks should not take 8 seconds for this small file.
Pylint version
pylint 2.12.2
astroid 2.9.3
Python 3.9.7 (default, Sep 3 2021, 12:45:31)
[Clang 12.0.0 (clang-1200.0.32.29)]
OS / Environment
OSX 10.15 (Catalina)
Additional dependencies
astroid==2.9.3 isort==5.10.1 lazy-object-proxy==1.7.1 mccabe==0.6.1 numpy==1.22.2 pandas==1.4.1 platformdirs==2.5.1 pylint==2.12.2 python-dateutil==2.8.2 pytz==2021.3 six==1.16.0 toml==0.10.2 typing_extensions==4.1.1 wrapt==1.13.3
:sparkles: This is an old work account. Please reference @brandonchinn178 for all future communication :sparkles:
~Update: the minimal repro might be fast with the version of pylint on master, so this might not be an issue anymore.~
Never mind, forgot to install pandas. When pandas is installed, the minimal repro is still slow using the version of pylint on main.
Related, will there be a pylint release anytime soon?
Hi @brandon-leapyear thank you for opening the issue. The next milestone for pylint is https://github.com/PyCQA/pylint/milestone/49, it's 89% done right now, we need a release of astroid in order to close it, it's here https://github.com/PyCQA/astroid/milestone/25, 70% done right now.
I'm able to reproduce this, though not at 8s
time pylint --disable=all foo.py
real 0m5.106s
user 0m5.353s
sys 0m0.341s
5s is still quite shocking so I agree this issue is worth having. Though I wonder how to even tackle an issue like this?
For curiosity, I ran time pylint test.py (so not disabling, just enabling whatever checks are enabled via config) and I"m getting something pretty similar
time pylint test.py
real 0m5.377s
user 0m5.536s
sys 0m0.388s
Then I remove the pandas usage in the file and the time comes down significantly on both enabled checks and disabling all checks
real 0m0.886s
user 0m0.653s
sys 0m0.113s
So to me this is an issue related to pylint and pandas not related to disabling checks.
Though I wonder how to even tackle an issue like this?
There's a documentation about performance for contributor here : https://pylint.pycqa.org/en/latest/development_guide/contributor_guide/profiling.html
Cool. An investigation with a profiler is definitely needed here to get some data on what's going on!
@clavedeluna There's another profiler called Yappi that I've used to great effect in Pylint profiling. I wrote up some instructions for it here: https://nickdrozd.github.io/2022/04/12/performance-hot-spots.html
I also came across of https://github.com/bloomberg/pytest-memray recently and wanted to check what it can do for pylint. I did not try anything yet.