pylint icon indicating copy to clipboard operation
pylint copied to clipboard

Slow with all checks disabled using pandas + dataclass

Open brandon-leapyear opened this issue 3 years ago • 7 comments

Bug description

Repro:

  1. mkdir test && cd test

  2. python3 -m venv venv

  3. venv/bin/pip install pylint pandas

  4. Write foo.py:

    from dataclasses import dataclass
    from pandas import DataFrame
    @dataclass
    class Foo:
        a: DataFrame
    
  5. time pylint --disable=all foo.py

And this consistently takes 8s to run on my machine. Doing any of the following brings the runtime down to <2s:

  • Comment out the dataclasses import
  • Comment out the pandas import
  • Comment out the @dataclass decorator
  • Use @foo instead of @dataclass
  • Use @dataclasses.dataclass instead of @dataclass
  • Use int instead of pandas.Series
  • Use Optional[DataFrame] instead of DataFrame
  • Uninstalling pandas from the environment

Configuration

No response

Command used

time pylint --disable=all foo.py

Pylint output

--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

 --disable=all foo.py  7.93s user 0.28s system 111% cpu 7.387 total

Expected behavior

Disabling all checks should not take 8 seconds for this small file.

Pylint version

pylint 2.12.2
astroid 2.9.3
Python 3.9.7 (default, Sep  3 2021, 12:45:31) 
[Clang 12.0.0 (clang-1200.0.32.29)]

OS / Environment

OSX 10.15 (Catalina)

Additional dependencies

astroid==2.9.3 isort==5.10.1 lazy-object-proxy==1.7.1 mccabe==0.6.1 numpy==1.22.2 pandas==1.4.1 platformdirs==2.5.1 pylint==2.12.2 python-dateutil==2.8.2 pytz==2021.3 six==1.16.0 toml==0.10.2 typing_extensions==4.1.1 wrapt==1.13.3

brandon-leapyear avatar Feb 25 '22 04:02 brandon-leapyear

:sparkles: This is an old work account. Please reference @brandonchinn178 for all future communication :sparkles:


~Update: the minimal repro might be fast with the version of pylint on master, so this might not be an issue anymore.~ Never mind, forgot to install pandas. When pandas is installed, the minimal repro is still slow using the version of pylint on main.

Related, will there be a pylint release anytime soon?

brandon-leapyear avatar Feb 25 '22 04:02 brandon-leapyear

Hi @brandon-leapyear thank you for opening the issue. The next milestone for pylint is https://github.com/PyCQA/pylint/milestone/49, it's 89% done right now, we need a release of astroid in order to close it, it's here https://github.com/PyCQA/astroid/milestone/25, 70% done right now.

Pierre-Sassoulas avatar Feb 25 '22 06:02 Pierre-Sassoulas

I'm able to reproduce this, though not at 8s

time pylint --disable=all foo.py

real	0m5.106s
user	0m5.353s
sys	0m0.341s

5s is still quite shocking so I agree this issue is worth having. Though I wonder how to even tackle an issue like this?

For curiosity, I ran time pylint test.py (so not disabling, just enabling whatever checks are enabled via config) and I"m getting something pretty similar

time pylint test.py

real	0m5.377s
user	0m5.536s
sys	0m0.388s

Then I remove the pandas usage in the file and the time comes down significantly on both enabled checks and disabling all checks

real	0m0.886s
user	0m0.653s
sys	0m0.113s

So to me this is an issue related to pylint and pandas not related to disabling checks.

clavedeluna avatar Nov 21 '22 21:11 clavedeluna

Though I wonder how to even tackle an issue like this?

There's a documentation about performance for contributor here : https://pylint.pycqa.org/en/latest/development_guide/contributor_guide/profiling.html

Pierre-Sassoulas avatar Nov 21 '22 21:11 Pierre-Sassoulas

Cool. An investigation with a profiler is definitely needed here to get some data on what's going on!

clavedeluna avatar Nov 21 '22 21:11 clavedeluna

@clavedeluna There's another profiler called Yappi that I've used to great effect in Pylint profiling. I wrote up some instructions for it here: https://nickdrozd.github.io/2022/04/12/performance-hot-spots.html

nickdrozd avatar Nov 21 '22 22:11 nickdrozd

I also came across of https://github.com/bloomberg/pytest-memray recently and wanted to check what it can do for pylint. I did not try anything yet.

Pierre-Sassoulas avatar Nov 22 '22 08:11 Pierre-Sassoulas