check-manifest icon indicating copy to clipboard operation
check-manifest copied to clipboard

`check-manifest` became super slow

Open jugmac00 opened this issue 3 years ago • 7 comments

For e.g. https://github.com/jugmac00/flask-reuploaded (but also for my other projects), I notice an increased runtime for the checks.

The biggest jump was from version 0.40 to 0.41, but it looks like it is getting slower with each release...

I do run check-manifest via pre-commit, but I could also reproduce the problem with a standalone version of check-manifest.

I only had a quick look and looks like the pyproject.toml support could have introduced the degraded performance. The runtime difference between 0.45 and 0.46 can be hardly explained by the changelog.

no check-manifest

❯ time pre-commit run --all-files
isort....................................................................Passed
Debug Statements (Python)................................................Passed
flake8...................................................................Passed
pyupgrade................................................................Passed
check-python-versions....................................................Passed
bandit...................................................................Passed

real    0m1,113s
user    0m1,760s
sys     0m0,245s



0.40

❯ time pre-commit run --all-files
isort....................................................................Passed
Debug Statements (Python)................................................Passed
flake8...................................................................Passed
pyupgrade................................................................Passed
check-python-versions....................................................Passed
check-manifest...........................................................Passed
bandit...................................................................Passed

real    0m1,577s


0.41

❯ time pre-commit run --all-files
isort....................................................................Passed
Debug Statements (Python)................................................Passed
flake8...................................................................Passed
pyupgrade................................................................Passed
check-python-versions....................................................Passed
check-manifest...........................................................Passed
bandit...................................................................Passed

real    0m4,366s
user    0m4,471s
sys     0m0,511s


0.42

❯ time pre-commit run --all-files
isort....................................................................Passed
Debug Statements (Python)................................................Passed
flake8...................................................................Passed
pyupgrade................................................................Passed
check-python-versions....................................................Passed
check-manifest...........................................................Passed
bandit...................................................................Passed

real    0m4,401s
user    0m4,557s
sys     0m0,456s

0.43

❯ time pre-commit run --all-files
isort....................................................................Passed
Debug Statements (Python)................................................Passed
flake8...................................................................Passed
pyupgrade................................................................Passed
check-python-versions....................................................Passed
check-manifest...........................................................Passed
bandit...................................................................Passed

real    0m4,743s
user    0m4,808s
sys     0m0,503s

0.44

❯ time pre-commit run --all-files
isort....................................................................Passed
Debug Statements (Python)................................................Passed
flake8...................................................................Passed
pyupgrade................................................................Passed
check-python-versions....................................................Passed
check-manifest...........................................................Passed
bandit...................................................................Passed

0.45

❯ time pre-commit run --all-files
isort....................................................................Passed
Debug Statements (Python)................................................Passed
flake8...................................................................Passed
pyupgrade................................................................Passed
check-python-versions....................................................Passed
check-manifest...........................................................Passed
bandit...................................................................Passed

real    0m8,464s
user    0m8,068s
sys     0m0,878s

0.46

❯ time pre-commit run --all-files
isort....................................................................Passed
Debug Statements (Python)................................................Passed
flake8...................................................................Passed
pyupgrade................................................................Passed
check-python-versions....................................................Passed
check-manifest...........................................................Passed
bandit...................................................................Passed

real    0m10,674s
user    0m10,093s
sys     0m1,118s

tests without pre-commit

0.40

❯ time check-manifest 
lists of files in version control and sdist match

real	0m0,440s
user	0m0,388s
sys	0m0,055s

0.41

❯ time check-manifest 
lists of files in version control and sdist match

real	0m3,311s
user	0m2,848s
sys	0m0,245s

0.45

❯ time check-manifest 
lists of files in version control and sdist match

real	0m9,664s
user	0m8,324s
sys	0m0,812s

0.46

❯ time check-manifest 
lists of files in version control and sdist match

real	0m10,041s
user	0m8,671s
sys	0m0,875s

So, the changelog maybe is not really complete - as even without using pre-commit there is a performance degradation between 0.45 and 0.46 which only changelog entry says pre-commit now uses Python 3.

jugmac00 avatar Apr 07 '21 15:04 jugmac00

I don't know what to say. All other changes in 0.46 were trivial cleanups/refactorings and the switch from Travis CI to GitHub Actions.

I wonder if it would be useful to have check-manifest print the times it took to build the sdists in verbose mode, but of course that wouldn't help the comparisons.

I suppose you could use strace with timings enabled...

mgedmin avatar Apr 07 '21 17:04 mgedmin

Thanks for your reply!

I never used strace before, but this looks not so good :-/

22:45:00 read(4, "lists of files in version contro"..., 4096) = 50
22:45:09 read(4, 0x1848090, 4096)       = -1 EIO (Input/output error)

Then I learned about the new option --no-build-isolation ( https://github.com/mgedmin/check-manifest#version-control-integration )

Then it looks like

22:52:28 read(4, "lists of files in version contro"..., 4096) = 50
22:52:30 read(4, 0x1512db0, 4096)       = -1 EIO (Input/output error)

Way better!

You can reproduce this with https://github.com/jugmac00/flask-reuploaded

Actually, I am happy again with check-manifest taking 2 s, but I am pretty curious what you could tell me about the above strace outputs.

jugmac00 avatar Apr 08 '21 21:04 jugmac00

Huh, it looks like you're stracing pre-commit rather than check-manifest. I was hoping to see the clone()/execve()/wait() from check-manifest invoking the actual sdist build thing (python setup.py sdist or the pep-517 thing I forget what it's called).

If you pass -f to strace, it'll trace all the child processes. I have a script to make it easier figure out which pid maps to which process: https://pypi.org/project/strace-process-tree/

Anyway, it looks like you've identified that the main slowdown is due to using isolated PEP-517 builds. And as for 0.45 -> 0.46 regression, is there any chance it can be explained by measurement errors (due to CPU frequency scaling or something)?

mgedmin avatar Apr 09 '21 07:04 mgedmin

I am pretty curious what you could tell me about the above strace outputs.

You see pre-commit reading check-manifest's output from a pipe (fd 4). The timestamp indicates when the read() syscall started, you might want to also pass -T to strace to see how long each syscall took to run.

I cannot explain why the next read returns EIO rather than 0 (for EOF) or EAGAIN (in case pre-commit uses nonblocking I/O). Were there any syscalls between the two read() calls? Are you on Linux?

mgedmin avatar Apr 09 '21 07:04 mgedmin

Hi Marius, thanks for taking the time to give me a quick introduction to strace.

I repeated the strace with -f directly for check-manifest which gave me 60k lines of output oO https://gist.github.com/jugmac00/ee8068afa2f30be57de44401e7ab474e

I lack both the knowledge and the time to get anything useful out of the output.

As I found a working solution ( --no-build-isolation), I came here to close the issue, as I do not want you to dig deeper, as for reasons I seem to be the only one affected.

Just when I wanted to close the issue, I saw that someone else linked to this issue - see above my comment.

Up to you what you do with this issue - if you want me to do perform any tests, just say so, but I am afraid I do not have the time to learn to interpret the output of strace currently.

jugmac00 avatar Apr 14 '21 11:04 jugmac00

Heh, yes, strace files are something. I wrote strace-process-tree so I wouldn't have to interpret them myself.

Now strace-process-tree doesn't like this format, but after some careful massaging with vim to have a pid on every line and avoid 'strace: Process NNN attached' messages breaking clone(... lines in half, I get:

8550 check-manifest [11.2s @0.0s]
  ├─8551 /bin/sh -c "uname -p 2> /dev/null" [0.0s @0.7s]
  │   └─8552 uname -p [0.0s @0.7s]
  ├─8553 git ls-files -z [0.0s @0.8s]
  ├─8554 /home/jugmac00/.local/pipx/venvs... -m build --sdist . --outdir /tmp/check-manifest-ol8rlbqi-sdi... [5.2s @0.9s]
  │   ├─8555 /home/jugmac00/.local/pipx/venvs... -m pip install --prefix /tmp/build-env-qwm98rzv --ignore-installed --no-warn-script-location -r /tmp/build-reqs-blpa_v85.txt [3.8s @1.3s]
  │   │   ├─8556 /sbin/ldconfig -p [@2.3s]
  │   │   ├─8556 /sbin/ldconfig.real -p [0.0s @2.3s]
  │   │   ├─8557 lsb_release -a [0.3s @2.4s]
  │   │   │   └─8558 dpkg-query -f '${Version} ${Provides}
  │   │   │             ' -W lsb-core lsb-cxx lsb-graphics lsb-desktop lsb-languages lsb-multimedia lsb-printing lsb-security [0.1s @2.6s]
  │   │   ├─8559 uname -rs [0.0s @2.7s]
  │   │   └─8560 /bin/sh -c "uname -p 2> /dev/null" [0.0s @2.7s]
  │   │       └─8561 uname -p [0.0s @2.7s]
  │   └─8564 /tmp/build-env-qwm98rzv/bin/pyth... /home/jugmac00/.local/pipx/venvs... build_sdist /tmp/tmpbr_cffjf [0.8s @5.1s]
  │       └─8565 /bin/sh -c "uname -p 2> /dev/null" [0.0s @5.5s]
  │           └─8566 uname -p [0.0s @5.5s]
  └─8567 /home/jugmac00/.local/pipx/venvs... -m build --sdist . --outdir /tmp/check-manifest-km_njyqc-sdi... [5.0s @6.1s]
      ├─8568 /home/jugmac00/.local/pipx/venvs... -m pip install --prefix /tmp/build-env-250p6syu --ignore-installed --no-warn-script-location -r /tmp/build-reqs-es7qklvl.txt [3.5s @6.5s]
      │   ├─8569 /sbin/ldconfig -p [@7.4s]
      │   ├─8569 /sbin/ldconfig.real -p [0.0s @7.4s]
      │   ├─8570 lsb_release -a [0.2s @7.5s]
      │   │   └─8571 dpkg-query -f '${Version} ${Provides}
      │   │             ' -W lsb-core lsb-cxx lsb-graphics lsb-desktop lsb-languages lsb-multimedia lsb-printing lsb-security [0.1s @7.7s]
      │   ├─8572 uname -rs [0.0s @7.7s]
      │   └─8573 /bin/sh -c "uname -p 2> /dev/null" [0.0s @7.8s]
      │       └─8574 uname -p [0.0s @7.8s]
      └─8575 /tmp/build-env-250p6syu/bin/pyth... /home/jugmac00/.local/pipx/venvs... build_sdist /tmp/tmp24jy2wxz [1.0s @10.0s]
          └─8576 /bin/sh -c "uname -p 2> /dev/null" [0.0s @10.5s]
              └─8577 uname -p [0.0s @10.5s]

As you can see, there are two python -m build --sdist ... invocations that take 5.2 seconds and 5.0 seconds respectively, and then there's a second of overhead in check-manifest itself, most of it at the very beginning, before even the git ls-files invocation.

That second of overhead is something that I would be interested in optimizing, if I had the time.

The two python -m build --sdist invocations could be done in parallel, which might speed things up (assuming enough CPU cores and no fighting over I/O bandwidth). If I had time.

mgedmin avatar Apr 14 '21 12:04 mgedmin

Unfortunately, I don't have the time to look into this right now but wanted to say that I have also experienced this same issue. As you can see below I am getting similar results to @jugmac00 for version 0.47

$ time check-manifest --version
check-manifest version 0.47
check-manifest --version  0.17s user 0.06s system 98% cpu 0.235 total

$ time check-manifest
lists of files in version control and sdist match
check-manifest  8.31s user 2.35s system 92% cpu 11.560 total

tpvasconcelos avatar Oct 16 '21 18:10 tpvasconcelos