Make Halide::round behave as documented
99% of the time, Halide::round does what it is documented to do, and what our tests verify that it does: It rounds to the nearest integer, with ties rounding to even.
However, if you have unvectorized code, and have been messing with the rounding mode flags, then it can behave differently. For vectorized code (on x86 at least) it explicitly uses the round-to-nearest-even flag on the roundps instruction.
This PR bumps that to 100%, by explicitly making it round ties to even in all cases, matching our docs and tests.
The current behavior just gave me a nasty Heisenbug where a test was passing on my dev machine and failing in CI because the rounding mode flag was somehow not the default one in the CI process.
Note that this means that Halide::round continues to behave differently to std::round on ties. Another option here would be to make the Halide rounding functions match the C standard library, excluding those that have behavior dependent on the rounding mode. This has a few problems though. First, it would break a bunch of Halide code, including our tests. Second, std::round makes the dubious choice to round ties away from zero, which is the one rounding mode not supported by x86 vector instructions. It's slow to emulate. I don't think we want to hand users that foot-gun. Third, the only way to round ties to even with the C standard library is by first setting the rounding mode flag. So if you exclude all problematic C standard library rounding modes, you're just left with ceil and floor. There's no sane round-to-nearest function. In our C backend implementation of round I just explicitly set the rounding mode and call nearbyintf.
I also looked to other languages for inspiration and discovered that in GLSL, round handles ties in an implementation-defined way, so (╯°□°)╯︵ ┻━┻
Not 100% sure I've done the right thing for hexagon, as it seems to have code that ensures a nearbyint implementation is available. I'm hoping CI will tell me if I got it wrong.
Lots of broken things
Yeah, the breakages are pre-existing: I added testing for tie-breaking behavior for our GPU backends, and it turns out they're all inconsistent right now. I'll work through them.
I think the llvm 16 breakages are unrelated though
Adding @pranavb-ca for HVX, @vksnk for C backend
I think the llvm 16 breakages are unrelated though
Don't think so -- main tests cleanly for me on local x86-64-linux with a freshly built LLVM16
EDIT: or the LLVM nightlies on the buildbots got a broken pull maybe? Will rebuild them.
At this point, all the buildbots should have rebuilt LLVM in place, so I think you have a real bug here
On linux correctness_math:
dlopenbuf failed
undefined PLT symbol roundevenf (30) ./libhalide_kernels30.soterminate called after throwing an instance of 'Halide::RuntimeError'
still failures that seem like non-flakes :-/
Seems to be wasm float16 related. Does wasm support float16 at all?
Seems to be wasm float16 related.
That doesn't explain (say) correctness_vector_reductions failing on 32-bit Windows (where we never test wasm): https://buildbot.halide-lang.org/master/#/builders/7/builds/386
Does wasm support float16 at all?
Nope.
Yeah, it doesn't explain all the failures. There's also something weird going on with vector_reductions, even though that should be unaffected. One bug at a time...
FYI: on OSX, vector_reductions eventually fails with HalideJITMemoryManager: unable to find address for _roundevenf (when running without sse41 or other simd)... unfortunately the error gets swallowed by "unhandled exception"
On osx-arm:
FAILED TEST: vector_reductions
vector_reductions: Testing with target(arm-64-osx-jit)
Intrinsic name not mangled correctly for type arguments! Should be: llvm.aarch64.neon.uaddlp.v1i64.v2i32
ptr @llvm.aarch64.neon.uaddlp.i64.v2i32
Internal Error at /Users/halidenightly/build_bot/worker/halide-testbranch-main-llvm16-arm-64-osx-make/halide-source/src/CodeGen_LLVM.cpp:645 triggered by user code at : Condition failed: !verifyFunction(*function, &llvm::errs()):
failing for host-d3d12compute :-/
Only remaining failure is one of the preexisting flakes on the windows builbot with the dodgy GPU
Only remaining failure is one of the preexisting flakes on the windows builbot with the dodgy GPU
winbot1? If so I'm just take that one offline. It's stupid slow as well.
Looks like win-worker-2. I noticed it flaking on AJ's PR as well: https://github.com/halide/Halide/pull/6884
Looks like win-worker-2. I noticed it flaking on AJ's PR as well: #6884
Urg, so we basically have winbot1, which is less flaky but stupid-slow, and winbot2 which is slightly less slow but flaky, and winbot3, which is reasonable but we'd prefer not to have a single winbot
LGTM, let me pull into google3 for some torture testing
LGTM, let me pull into google3 for some torture testing
FYI, I'm failing on arm32-android builds with ld.lld: error: undefined symbol: roundevenf
Annoyingly, arm32 has a round-to-nearest-even instruction, but llvm doesn't seem to want to emit it. I'll just lower it to bitwise ops on that target.
OK, with that last change in place, i now have a fairly large number of failures that all seem to be variants on recursion-too-deep ending with SIGSEGV or similar; stand by
#0 0x000055555ada3e9a in Halide::Internal::Simplify::visit(Halide::Internal::Add const*, Halide::Internal::Simplify::ExprInfo*) ()
#1 0x000055555ad95ab8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#2 0x000055555ae0095c in Halide::Internal::Simplify::visit(Halide::Internal::Load const*, Halide::Internal::Simplify::ExprInfo*) ()
#3 0x000055555ad95bc8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#4 0x000055555aeaf457 in Halide::Internal::Simplify::visit(Halide::Internal::Mul const*, Halide::Internal::Simplify::ExprInfo*) ()
#5 0x000055555ad95ae8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#6 0x000055555add37b0 in Halide::Internal::Simplify::visit(Halide::Internal::Call const*, Halide::Internal::Simplify::ExprInfo*) ()
#7 0x000055555ad95be8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#8 0x000055555aec0c78 in Halide::Internal::Simplify::visit(Halide::Internal::Reinterpret const*, Halide::Internal::Simplify::ExprInfo*) ()
#9 0x000055555ad95a9e in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#10 0x000055555adcf7d5 in Halide::Internal::Simplify::visit(Halide::Internal::Call const*, Halide::Internal::Simplify::ExprInfo*) ()
#11 0x000055555ad95be8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#12 0x000055555adcf9dc in Halide::Internal::Simplify::visit(Halide::Internal::Call const*, Halide::Internal::Simplify::ExprInfo*) ()
#13 0x000055555ad95be8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#14 0x000055555aee8d55 in Halide::Internal::Simplify::visit(Halide::Internal::Sub const*, Halide::Internal::Simplify::ExprInfo*) ()
#15 0x000055555ad95ac8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide--Type <RET> for more, q to quit, c to continue without paging--
::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#16 0x000055555aec0c78 in Halide::Internal::Simplify::visit(Halide::Internal::Reinterpret const*, Halide::Internal::Simplify::ExprInfo*) ()
#17 0x000055555ad95a9e in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#18 0x000055555ada3edb in Halide::Internal::Simplify::visit(Halide::Internal::Add const*, Halide::Internal::Simplify::ExprInfo*) ()
#19 0x000055555ad95ab8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#20 0x000055555adda236 in Halide::Internal::Simplify::visit(Halide::Internal::Cast const*, Halide::Internal::Simplify::ExprInfo*) ()
#21 0x000055555ad95a91 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#22 0x000055555adcf7d5 in Halide::Internal::Simplify::visit(Halide::Internal::Call const*, Halide::Internal::Simplify::ExprInfo*) ()
#23 0x000055555ad95be8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#24 0x000055555adda236 in Halide::Internal::Simplify::visit(Halide::Internal::Cast const*, Halide::Internal::Simplify::ExprInfo*) ()
#25 0x000055555ad95a91 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#26 0x000055555aee8d7b in Halide::Internal::Simplify::visit(Halide::Internal::Sub const*, Halide::Internal::Simplify::ExprInfo*) ()
#27 0x000055555ad95ac8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#28 0x000055555adcf7d5 in Halide::Internal::Simplify::visit(Halide::Internal::Call const*, Halide::Internal::Simplify::ExprInfo*) ()
#29 0x000055555ad95be8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#30 0x000055555adcf7d5 in Halide::Internal::Simplify::visit(Halide::Internal::Call const*, Halide::Internal::Simplify::ExprInfo*) ()
--Type <RET> for more, q to quit, c to continue without paging--
#31 0x000055555ad95be8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#32 0x000055555aec0c78 in Halide::Internal::Simplify::visit(Halide::Internal::Reinterpret const*, Halide::Internal::Simplify::ExprInfo*) ()
#33 0x000055555ad95a9e in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#34 0x000055555aee8d7b in Halide::Internal::Simplify::visit(Halide::Internal::Sub const*, Halide::Internal::Simplify::ExprInfo*) ()
#35 0x000055555ad95ac8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#36 0x000055555ad94a88 in Halide::Internal::simplify(Halide::Expr const&, bool, Halide::Internal::Scope<Halide::Internal::Interval> const&, Halide::Internal::Scope<Halide::Internal::ModulusRemainder> const&) ()
#37 0x000055555aada86b in Halide::Internal::lower_round_to_nearest_ties_to_even(Halide::Expr const&) ()
#38 0x000055555aae6ea4 in Halide::Internal::(anonymous namespace)::CodeGen_ARM::visit(Halide::Internal::Call const*) ()
#39 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
#40 0x000055555aab4c64 in Halide::Internal::CodeGen_LLVM::visit(Halide::Internal::Let const*) ()
#41 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
#42 0x000055555aae6eb3 in Halide::Internal::(anonymous namespace)::CodeGen_ARM::visit(Halide::Internal::Call const*) ()
#43 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
#44 0x000055555aab4c64 in Halide::Internal::CodeGen_LLVM::visit(Halide::Internal::Let const*) ()
#45 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
#46 0x000055555aae6eb3 in Halide::Internal::(anonymous namespace)::CodeGen_ARM::visit(Halide::Internal::Call const*) ()
#47 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
#48 0x000055555aab4c64 in Halide::Internal::CodeGen_LLVM::visit(Halide::Internal::Let const*) ()
#49 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
--Type <RET> for more, q to quit, c to continue without paging--
#50 0x000055555aae6eb3 in Halide::Internal::(anonymous namespace)::CodeGen_ARM::visit(Halide::Internal::Call const*) ()
#51 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
#52 0x000055555aab4c64 in Halide::Internal::CodeGen_LLVM::visit(Halide::Internal::Let const*) ()
#53 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
#54 0x000055555aae6eb3 in Halide::Internal::(anonymous namespace)::CodeGen_ARM::visit(Halide::Internal::Call const*) ()
#55 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
#56 0x000055555aab4c64 in Halide::Internal::CodeGen_LLVM::visit(Halide::Internal::Let const*) ()
#57 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
#58 0x000055555aae6eb3 in Halide::Internal::(anonymous namespace)::CodeGen_ARM::visit(Halide::Internal::Call const*) ()
...
#63127 0x000055555aab4d86 in Halide::Internal::CodeGen_LLVM::visit(Halide::Internal::LetStmt const*) ()
#63128 0x000055555aa9ca73 in Halide::Internal::CodeGen_LLVM::codegen(Halide::Internal::Stmt const&) ()
#63129 0x000055555aab7e89 in Halide::Internal::CodeGen_LLVM::visit(Halide::Internal::Block const*) ()
#63130 0x000055555aa975c8 in Halide::Internal::CodeGen_LLVM::compile_func(Halide::Internal::LoweredFunc const&, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char> > const&, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char> > const&) ()
#63131 0x00007ffff7d33aa0 in ?? () from /usr/grte/v5/lib64/libc.so.6
#63132 0x00007fffffffb490 in ?? ()
#63133 0x0000000000000000 in ?? ()
@vksnk -- there's an xtensa-related issue that is (hopefully) the final blocker here; LMK when you are back in the office. (Basically: we are missing a float32->uint32 vector reinterpret, or, better yet, a vector round-to-even)
OK, I think we're good to go.
I want to wait for LLVM on x86-32 to get fixed before we land this; a proposed fix is pending now, so hopefully only another day or so