distutils icon indicating copy to clipboard operation
distutils copied to clipboard

PERF: remove isdir() check in copy_file(), about 10% of the run time was checking if the file to copy is a directory.

Open morotti opened this issue 1 year ago • 5 comments

( I originally raised the PR to setuptools https://github.com/pypa/setuptools/pull/4406 and they redirected me to this repo. )

Hello, I was doing a bit of profiling on my CI builds. The step to build package (python setup.py bdist_wheel) is taking more time than I would like.

Going in with a commercial profiler, I am finding that nearly 10% of the time is taken in isdir() calls, mostly in the copy_file() function. Which is quite surprising IMO because the copy file function should not be given directories anyway :D

Can I offer to remove the isdir() calls to get a speedup? I've found 2 calls that were passing directories instead of a filepath and corrected them.

By the way, the whole _copy_file_contents() could be replaced by shutil.copyfile() for another little boost. The code goes back to year 2000 or before, which predates optimized shutil functions.

Screenshot of the profiler run. The whole run is about 20+ seconds. image

All tests are passing in the setuptools repos, including all integration tests. They don't seem to be able to run in this repo, failing in some macos imports

Regards.

morotti avatar May 31 '24 10:05 morotti

@jaraco I saw you fixed the tests on master. I've rebased and tests are green now for this PR. Can you have another look?

There is a diffcov workflow complaining about less test coverage? If i understand correctly, it's saying 2 lines are not covered by the tests, inside bdist_rpm code that's copying the icon of the RPM. I hope that's not a blocker.

morotti avatar Jun 20 '24 13:06 morotti

A couple of alternative ways we might be able to improve performance without changing the API:

  1. Instead of testing for is_dir, simply attempt each copy in a try/except that first attempts to copy to {dest}/{basename(source)} and if that fails, copy to dest.
  2. Wrap is_dir in a cache function so it doesn't have to go to the file system for repeated copies to the same dir.

jaraco avatar Jun 28 '24 12:06 jaraco

I wonder if it makes sense to shortcircuit the condition with a (yet another) new keyword only parameter (e.g. isdir: None | Literal[True] | Literal[False], "tri-state")? Something like:

def copy_file(  # noqa: C901
    src,
    dst,
    preserve_mode=1,
    preserve_times=1,
    update=0,
    link=None,
    verbose=1,
    dry_run=0,
+   *,
+   isdir=None,
):
   ...
-   if os.path.isdir(dst):
+   if isdir is True or (isdir is None and os.path.isdir(dst)):
        dir = dst
        dst = os.path.join(dst, os.path.basename(src))
    else:
        dir = os.path.dirname(dst)

The pro of such approach is that it is backwards compatible... The cons is that it introduces yet another argument in a function that is already full of arguments AND it feels very messy...

abravalheri avatar Jun 28 '24 13:06 abravalheri

I am trying to think of what to do without getting into horrible hacks 🤔 🤔 🤔
The caching is tempting as a simple solution but the invalidation is going to be a problem. e.g. code to uninstall a wheel then reinstall a different version. Having two functions copy_file_to_file() copy_file_to_dir() is a bit verbose but it's clear on what they do and won't cause surprises to the next maintainers or users.

morotti avatar Jul 01 '24 15:07 morotti

The caching is tempting as a simple solution but the invalidation is going to be a problem. e.g. code to uninstall a wheel then reinstall a different version.

That's a good point, although since is_dir() is checking for whether a target is a dir or a file, it seems unlikely that it would be one and then the other, even when installing different packages.

My instinct - unless there's a clean way to implement this performance improvement, we'll probably want to defer this improvement until distutils is no longer an independent code base (importable as distutils) and is fully integrated with Setuptools.

jaraco avatar Aug 27 '24 07:08 jaraco