Disable LLVM loop optimization by default (Issue #4113)
Up to now, Halide defaulted to having LLVM run its own loop optimization passes. These are designed for typical human-written C/C++ code (and do a good job there), but recent work suggests that it's rarely very useful for Halide-produced code (it tends to expand code size and/or compile time without usually improving performance in any meaningful way). For that reason, Halide should flip to make this 'opt-in' instead of 'opt-out'.
This change removes the DisableLLVMLoopOpt flag entirely. You can still opt-in to the old behavior via EnableLLVMLoopOpt.
(Landing this assumes that most downstream users are OK with this change. I've already made the appropriate changes inside Google and we can land this anytime from our perspective.)
I'm worried about discoverability of this change for people currently using the flag. It will suddenly become an invalid target. Do you think we should briefly keep the flag disable_llvm_loop_opt and have it emit a warning that this flag is now the default behavior and that they should use enable_llvm_loop_opt to restore the old behavior?
Or perhaps the only people using this flag are insiders anyway.
Or perhaps the only people using this flag are insiders anyway.
This is what I'd suspect. But maybe announcing this on halide-dev@ and Gitter to get more input would be worthwhile. (We don't need to be in a hurry to land this.)
Hm, maybe this really is moving the needle in this case:
cd bin/build/tmp ; /home/halidenightly/build_bot/worker/linux-64-gcc7-testbranch-trunk/halide-Release/bin/performance_boundary_conditions
unbounded : 88274.299000 us
constant_exterior : 1574301.813000 us
Error: constant_exterior is 17.834204 times slower than unbounded
/home/halidenightly/build_bot/worker/linux-64-gcc7-testbranch-trunk/halide/Makefile:1921: recipe for target 'performance_boundary_conditions' failed
make: *** [performance_boundary_conditions] Error 255
I can't repro that slowdown locally. Probably a flake.
The sum helper is going to cause problems. You want to unroll them when they're small, but the helper doesn't give you access to the Func. Right now it's fine because LLVM unrolls it anyway. We may need to redesign the helper.
Bringing this back up to see if we (finally) want to land this for Halide 13.
I found that this target flag speeds up many of the correctness tests substantially (e.g. correctness_vector_reductions goes from 42s to 27s).
This change might dramatically speed up buildbot testing if that pattern holds, and I think it would for the correctness tests.
Since we're about to cut a branch for Halide 12, I propose we land this for Halide 13, since it's a potentially-breaking change.
Sounds good, but I believe there are one or two things to merge into master before we have a complete 12.0 branch.
Sounds good, but I believe there are one or two things to merge into master before we have a complete 12.0 branch.
Yep, just want to get it reviewed so we can land it after the 12 branch is cut
Oh sorry, I thought this was a different PR. I'm still concerned that this will cause regressions for any use of the sum helper over small reduction domains, where currently we can rely on llvm unrolling it if sensible to do so. These regressions would be impossible to fix in the schedule because the sum helper sucks that way.
Should we try fixing those helpers first? I think the way to fix those helpers we agreed on last time it was brought up is to let them take the Func to use as the last arg instead of just a string name for the Func. Then code would be easier to update to explicitly unroll things.
Oh sorry, I thought this was a different PR.
Probably thinking of https://github.com/halide/Halide/pull/5900 (which really should land regardless of this one, hint hint)
Should we try fixing those helpers first?
Sure, but I don't recall the discussion you mention -- are you sure it was me?
It was a small part of a dev meeting a few months ago, not a direct discussion with you.