Halide icon indicating copy to clipboard operation
Halide copied to clipboard

Make Halide::round behave as documented

Open abadams opened this issue 3 years ago • 21 comments

99% of the time, Halide::round does what it is documented to do, and what our tests verify that it does: It rounds to the nearest integer, with ties rounding to even.

However, if you have unvectorized code, and have been messing with the rounding mode flags, then it can behave differently. For vectorized code (on x86 at least) it explicitly uses the round-to-nearest-even flag on the roundps instruction.

This PR bumps that to 100%, by explicitly making it round ties to even in all cases, matching our docs and tests.

The current behavior just gave me a nasty Heisenbug where a test was passing on my dev machine and failing in CI because the rounding mode flag was somehow not the default one in the CI process.

Note that this means that Halide::round continues to behave differently to std::round on ties. Another option here would be to make the Halide rounding functions match the C standard library, excluding those that have behavior dependent on the rounding mode. This has a few problems though. First, it would break a bunch of Halide code, including our tests. Second, std::round makes the dubious choice to round ties away from zero, which is the one rounding mode not supported by x86 vector instructions. It's slow to emulate. I don't think we want to hand users that foot-gun. Third, the only way to round ties to even with the C standard library is by first setting the rounding mode flag. So if you exclude all problematic C standard library rounding modes, you're just left with ceil and floor. There's no sane round-to-nearest function. In our C backend implementation of round I just explicitly set the rounding mode and call nearbyintf.

I also looked to other languages for inspiration and discovered that in GLSL, round handles ties in an implementation-defined way, so (╯°□°)╯︵ ┻━┻

Not 100% sure I've done the right thing for hexagon, as it seems to have code that ensures a nearbyint implementation is available. I'm hoping CI will tell me if I got it wrong.

abadams avatar Sep 13 '22 23:09 abadams

Lots of broken things

steven-johnson avatar Sep 14 '22 16:09 steven-johnson

Yeah, the breakages are pre-existing: I added testing for tie-breaking behavior for our GPU backends, and it turns out they're all inconsistent right now. I'll work through them.

abadams avatar Sep 14 '22 16:09 abadams

I think the llvm 16 breakages are unrelated though

abadams avatar Sep 14 '22 16:09 abadams

Adding @pranavb-ca for HVX, @vksnk for C backend

steven-johnson avatar Sep 14 '22 17:09 steven-johnson

I think the llvm 16 breakages are unrelated though

Don't think so -- main tests cleanly for me on local x86-64-linux with a freshly built LLVM16

EDIT: or the LLVM nightlies on the buildbots got a broken pull maybe? Will rebuild them.

steven-johnson avatar Sep 14 '22 17:09 steven-johnson

At this point, all the buildbots should have rebuilt LLVM in place, so I think you have a real bug here

steven-johnson avatar Sep 14 '22 21:09 steven-johnson

On linux correctness_math:

dlopenbuf failed
undefined PLT symbol roundevenf (30) ./libhalide_kernels30.soterminate called after throwing an instance of 'Halide::RuntimeError'

steven-johnson avatar Sep 14 '22 21:09 steven-johnson

still failures that seem like non-flakes :-/

steven-johnson avatar Sep 15 '22 19:09 steven-johnson

Seems to be wasm float16 related. Does wasm support float16 at all?

abadams avatar Sep 15 '22 19:09 abadams

Seems to be wasm float16 related.

That doesn't explain (say) correctness_vector_reductions failing on 32-bit Windows (where we never test wasm): https://buildbot.halide-lang.org/master/#/builders/7/builds/386

Does wasm support float16 at all?

Nope.

steven-johnson avatar Sep 15 '22 19:09 steven-johnson

Yeah, it doesn't explain all the failures. There's also something weird going on with vector_reductions, even though that should be unaffected. One bug at a time...

abadams avatar Sep 15 '22 19:09 abadams

FYI: on OSX, vector_reductions eventually fails with HalideJITMemoryManager: unable to find address for _roundevenf (when running without sse41 or other simd)... unfortunately the error gets swallowed by "unhandled exception"

steven-johnson avatar Sep 16 '22 17:09 steven-johnson

On osx-arm:

FAILED TEST: vector_reductions
vector_reductions: Testing with target(arm-64-osx-jit)
Intrinsic name not mangled correctly for type arguments! Should be: llvm.aarch64.neon.uaddlp.v1i64.v2i32
ptr @llvm.aarch64.neon.uaddlp.i64.v2i32
Internal Error at /Users/halidenightly/build_bot/worker/halide-testbranch-main-llvm16-arm-64-osx-make/halide-source/src/CodeGen_LLVM.cpp:645 triggered by user code at : Condition failed: !verifyFunction(*function, &llvm::errs()): 

steven-johnson avatar Sep 20 '22 22:09 steven-johnson

failing for host-d3d12compute :-/

steven-johnson avatar Sep 22 '22 01:09 steven-johnson

Only remaining failure is one of the preexisting flakes on the windows builbot with the dodgy GPU

abadams avatar Sep 22 '22 17:09 abadams

Only remaining failure is one of the preexisting flakes on the windows builbot with the dodgy GPU

winbot1? If so I'm just take that one offline. It's stupid slow as well.

steven-johnson avatar Sep 22 '22 17:09 steven-johnson

Looks like win-worker-2. I noticed it flaking on AJ's PR as well: https://github.com/halide/Halide/pull/6884

abadams avatar Sep 22 '22 17:09 abadams

Looks like win-worker-2. I noticed it flaking on AJ's PR as well: #6884

Urg, so we basically have winbot1, which is less flaky but stupid-slow, and winbot2 which is slightly less slow but flaky, and winbot3, which is reasonable but we'd prefer not to have a single winbot

steven-johnson avatar Sep 22 '22 17:09 steven-johnson

LGTM, let me pull into google3 for some torture testing

steven-johnson avatar Sep 22 '22 19:09 steven-johnson

LGTM, let me pull into google3 for some torture testing

FYI, I'm failing on arm32-android builds with ld.lld: error: undefined symbol: roundevenf

steven-johnson avatar Sep 22 '22 19:09 steven-johnson

Annoyingly, arm32 has a round-to-nearest-even instruction, but llvm doesn't seem to want to emit it. I'll just lower it to bitwise ops on that target.

abadams avatar Sep 22 '22 19:09 abadams

OK, with that last change in place, i now have a fairly large number of failures that all seem to be variants on recursion-too-deep ending with SIGSEGV or similar; stand by

steven-johnson avatar Sep 23 '22 17:09 steven-johnson

#0  0x000055555ada3e9a in Halide::Internal::Simplify::visit(Halide::Internal::Add const*, Halide::Internal::Simplify::ExprInfo*) ()
#1  0x000055555ad95ab8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#2  0x000055555ae0095c in Halide::Internal::Simplify::visit(Halide::Internal::Load const*, Halide::Internal::Simplify::ExprInfo*) ()
#3  0x000055555ad95bc8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#4  0x000055555aeaf457 in Halide::Internal::Simplify::visit(Halide::Internal::Mul const*, Halide::Internal::Simplify::ExprInfo*) ()
#5  0x000055555ad95ae8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#6  0x000055555add37b0 in Halide::Internal::Simplify::visit(Halide::Internal::Call const*, Halide::Internal::Simplify::ExprInfo*) ()
#7  0x000055555ad95be8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#8  0x000055555aec0c78 in Halide::Internal::Simplify::visit(Halide::Internal::Reinterpret const*, Halide::Internal::Simplify::ExprInfo*) ()
#9  0x000055555ad95a9e in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#10 0x000055555adcf7d5 in Halide::Internal::Simplify::visit(Halide::Internal::Call const*, Halide::Internal::Simplify::ExprInfo*) ()
#11 0x000055555ad95be8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#12 0x000055555adcf9dc in Halide::Internal::Simplify::visit(Halide::Internal::Call const*, Halide::Internal::Simplify::ExprInfo*) ()
#13 0x000055555ad95be8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#14 0x000055555aee8d55 in Halide::Internal::Simplify::visit(Halide::Internal::Sub const*, Halide::Internal::Simplify::ExprInfo*) ()
#15 0x000055555ad95ac8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide--Type <RET> for more, q to quit, c to continue without paging--
::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#16 0x000055555aec0c78 in Halide::Internal::Simplify::visit(Halide::Internal::Reinterpret const*, Halide::Internal::Simplify::ExprInfo*) ()
#17 0x000055555ad95a9e in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#18 0x000055555ada3edb in Halide::Internal::Simplify::visit(Halide::Internal::Add const*, Halide::Internal::Simplify::ExprInfo*) ()
#19 0x000055555ad95ab8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#20 0x000055555adda236 in Halide::Internal::Simplify::visit(Halide::Internal::Cast const*, Halide::Internal::Simplify::ExprInfo*) ()
#21 0x000055555ad95a91 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#22 0x000055555adcf7d5 in Halide::Internal::Simplify::visit(Halide::Internal::Call const*, Halide::Internal::Simplify::ExprInfo*) ()
#23 0x000055555ad95be8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#24 0x000055555adda236 in Halide::Internal::Simplify::visit(Halide::Internal::Cast const*, Halide::Internal::Simplify::ExprInfo*) ()
#25 0x000055555ad95a91 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#26 0x000055555aee8d7b in Halide::Internal::Simplify::visit(Halide::Internal::Sub const*, Halide::Internal::Simplify::ExprInfo*) ()
#27 0x000055555ad95ac8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#28 0x000055555adcf7d5 in Halide::Internal::Simplify::visit(Halide::Internal::Call const*, Halide::Internal::Simplify::ExprInfo*) ()
#29 0x000055555ad95be8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#30 0x000055555adcf7d5 in Halide::Internal::Simplify::visit(Halide::Internal::Call const*, Halide::Internal::Simplify::ExprInfo*) ()
--Type <RET> for more, q to quit, c to continue without paging--
#31 0x000055555ad95be8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#32 0x000055555aec0c78 in Halide::Internal::Simplify::visit(Halide::Internal::Reinterpret const*, Halide::Internal::Simplify::ExprInfo*) ()
#33 0x000055555ad95a9e in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#34 0x000055555aee8d7b in Halide::Internal::Simplify::visit(Halide::Internal::Sub const*, Halide::Internal::Simplify::ExprInfo*) ()
#35 0x000055555ad95ac8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#36 0x000055555ad94a88 in Halide::Internal::simplify(Halide::Expr const&, bool, Halide::Internal::Scope<Halide::Internal::Interval> const&, Halide::Internal::Scope<Halide::Internal::ModulusRemainder> const&) ()
#37 0x000055555aada86b in Halide::Internal::lower_round_to_nearest_ties_to_even(Halide::Expr const&) ()
#38 0x000055555aae6ea4 in Halide::Internal::(anonymous namespace)::CodeGen_ARM::visit(Halide::Internal::Call const*) ()
#39 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
#40 0x000055555aab4c64 in Halide::Internal::CodeGen_LLVM::visit(Halide::Internal::Let const*) ()
#41 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
#42 0x000055555aae6eb3 in Halide::Internal::(anonymous namespace)::CodeGen_ARM::visit(Halide::Internal::Call const*) ()
#43 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
#44 0x000055555aab4c64 in Halide::Internal::CodeGen_LLVM::visit(Halide::Internal::Let const*) ()
#45 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
#46 0x000055555aae6eb3 in Halide::Internal::(anonymous namespace)::CodeGen_ARM::visit(Halide::Internal::Call const*) ()
#47 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
#48 0x000055555aab4c64 in Halide::Internal::CodeGen_LLVM::visit(Halide::Internal::Let const*) ()
#49 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
--Type <RET> for more, q to quit, c to continue without paging--
#50 0x000055555aae6eb3 in Halide::Internal::(anonymous namespace)::CodeGen_ARM::visit(Halide::Internal::Call const*) ()
#51 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
#52 0x000055555aab4c64 in Halide::Internal::CodeGen_LLVM::visit(Halide::Internal::Let const*) ()
#53 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
#54 0x000055555aae6eb3 in Halide::Internal::(anonymous namespace)::CodeGen_ARM::visit(Halide::Internal::Call const*) ()
#55 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
#56 0x000055555aab4c64 in Halide::Internal::CodeGen_LLVM::visit(Halide::Internal::Let const*) ()
#57 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
#58 0x000055555aae6eb3 in Halide::Internal::(anonymous namespace)::CodeGen_ARM::visit(Halide::Internal::Call const*) ()

...

#63127 0x000055555aab4d86 in Halide::Internal::CodeGen_LLVM::visit(Halide::Internal::LetStmt const*) ()
#63128 0x000055555aa9ca73 in Halide::Internal::CodeGen_LLVM::codegen(Halide::Internal::Stmt const&) ()
#63129 0x000055555aab7e89 in Halide::Internal::CodeGen_LLVM::visit(Halide::Internal::Block const*) ()
#63130 0x000055555aa975c8 in Halide::Internal::CodeGen_LLVM::compile_func(Halide::Internal::LoweredFunc const&, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char> > const&, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char> > const&) ()
#63131 0x00007ffff7d33aa0 in ?? () from /usr/grte/v5/lib64/libc.so.6
#63132 0x00007fffffffb490 in ?? ()
#63133 0x0000000000000000 in ?? ()

steven-johnson avatar Sep 23 '22 17:09 steven-johnson

@vksnk -- there's an xtensa-related issue that is (hopefully) the final blocker here; LMK when you are back in the office. (Basically: we are missing a float32->uint32 vector reinterpret, or, better yet, a vector round-to-even)

steven-johnson avatar Sep 23 '22 21:09 steven-johnson

OK, I think we're good to go.

steven-johnson avatar Sep 24 '22 00:09 steven-johnson

I want to wait for LLVM on x86-32 to get fixed before we land this; a proposed fix is pending now, so hopefully only another day or so

steven-johnson avatar Sep 26 '22 17:09 steven-johnson