make.py Performance improvement

I've noticed some in-efficiency in make.py implementation. Running make.py on one of my projects takes 3 seconds even if everything is up to date and nothing to build. After some investigation, I was able to reduce the running time from 3s to 0.4s.

The current implementation works by this way: The worker threads are driven by the main thread. The main thread travels over the dependency graph over and over again, to see whether there are works ready to do and queue them. And there is a 10ms sleep between iterations which causes a major slow down because my project requires many iterations to finish. And the main thread increases running time further by repetitively doing works like parsing d files, stat on dependencies.

Some obvious mitigations to these problems are replacing sleep with semaphore and adding memoization to costly functions, which I've implemented in my branch https://github.com/account-login/make.py/commit/f74284ff6c71dc4b6e8070f0c660cac99f7f360b.

I also have an idea to improve further: Instead of checking for work by traveling the graph top-down, the working threads could be driven by the finishing of dependency. When a target finishes, the working thread checks for its dependent and enqueue it when all its dependencies are ready. This improvement probably requires rewriting most of the code.

Jan 27 '21 14:01 account-login

Hey, sorry for the late response. Your branch looks good! Some comments:

Using real thread synchronization is a good idea, and would be cleaner than the current sleep approach. Are you sure that's related to the slowness in your null build case, though? When there are no targets to build the main thread should only walk the build graph once. In any case, it's worth doing.
I don't think a semaphore is the right primitive to use here (if I understand your code). Every time a rule completes, it signals one more walk of the build graph, so if multiple rules complete simultaneously before the main thread wakes up, the graph is walked twice in a row. Using a threading.Event object instead seems better: it just signals whether any rules completed since the last time the main thread was woken up.
Are all of the status_update() calls necessary? I think this would only need to be called after run_cmd() in the builder threads--this is mainly to signal to the main thread that the state of the filesystem has changed and the build graph should be re-walked. I believe these extra calls would contribute to excessive walks mentioned above due to using a semaphore.
I agree that builder threads looking at dependent rules once a rule is done would be nice. It might not even be too hard to do, but multithreading synchronization would be a bit tricky...
Caching of timestamps and .d files is good--looks nice!

Also, a couple other things not related to this issue:

Your commits cleaning up the output are nice, and I'd be fine merging them. They do currently break the non-verbose output though (the stdout_write() of building_text should only be done if progress_line is false). Also, does it work on Windows? That is, does os.isatty() return False for older Windows cmd.exe that doesn't support ANSI color? I don't have a machine to test on.
I think your commits are enough to add your name to the copyright if you'd like to.

Feb 03 '21 21:02 zwegner

Hello, I've checked the null build case again and found another problem: https://github.com/account-login/make.py/commit/8a83d39473e87983c5f5106bfc32725747c90603. When dealing rule with multiple targets, only one of the target is marked in the completed set, while all targets were marked in the visited set before, so those targets will never be checked until the next graph traversal (visited is cleared before graph traversal).

The excessive semaphore wake-up is a good catch. And some of the status_update() is probably not needed (if the above bug is fixed). I did not check the necessity of all the status_update() call back then. I just added those calls after the completed variable.

The color detection on windows is missing. Maybe we can copy something from https://github.com/django/django/blob/master/django/core/management/color.py.

Also, I think it is possible that adding a rule to the queue twice due to the lack of synchronization. And I'm in the process of reimplementing a subset of this tool, with proper synchronization, and walking the graph only once.

Feb 08 '21 17:02 account-login

make.py make.py copied to clipboard

Performance improvement

make.py
make.py copied to clipboard