Halide icon indicating copy to clipboard operation
Halide copied to clipboard

Rewrite the pass that adds mutexes for atomic nodes

Open abadams opened this issue 1 year ago • 1 comments

For O(n) nested allocate nodes, this pass was quadratic in n, even if there was no use of atomics. This commit rewrites it to use a linear-time algorithm, and skips it entirely after the first validation pass if there aren't any atomic nodes.

It also needlessly used IRGraphMutators, which slowed things down, didn't handle LargeBuffers (could overflow in the allocation), incorrectly thought every producer/consumer node was associated with an output buffer, and didn't print the realization name when printing the atomic node (the body of an atomic node is only atomic w.r.t. a specific realization).

I noticed all this because it stuck out in a profile. For resnet 50, the rewrite that changed to a linear algorithm took this stage from 185ms down to 6.7ms, and then skipping it entirely when it doesn't find any atomic nodes took it to 1.5ms for the single IRVisitor check.

For local laplacian with 100 pyramid levels (which contains many nested allocate nodes due to a large number of skip connections), the times are 5846 ms -> 16 ms -> 4.6 ms

This is built on top of #8103

abadams avatar Feb 18 '24 19:02 abadams

I changed the base to abadams/scope_improvements to see just the changes here

steven-johnson avatar Feb 18 '24 20:02 steven-johnson

Overnight test is good, ready to land (failure is irrelevant and will be fixed when https://github.com/llvm/llvm-project/pull/83151 lands)

steven-johnson avatar Feb 27 '24 18:02 steven-johnson