bevy icon indicating copy to clipboard operation
bevy copied to clipboard

unwrap_or_else(|| debug_unreachable_unchecked()) still generates jump instructions

Open james7132 opened this issue 3 years ago • 3 comments

What problem does this solve or what need does it fill?

When writing the core Fetch related code, there's a large amount of unwrap_or_else(|| debug_unreachable_unchecked() to suggest to the compiler that, in release builds, the Option is never None and unwrapping is always safe. This is supposed to hint to the compiler that this branch is dead.

However, when attempting to compare this pattern against an actual get_unchecked implementation for SparseSets seems to still generates two jump instructions (at least on x86 platforms). You can see this here: https://godbolt.org/z/e3EPE1vxW, the full assembly output has been copied below.

example::get_checked:
        mov     rax, qword ptr [rdi + 24]
        shl     rsi, 4
        mov     rax, qword ptr [rax + rsi + 8]
        shl     rax, 3
        add     rax, qword ptr [rdi]
        ret

example::get_unwrap_checked_unreachable:
        mov     rcx, rsi
        shl     rcx, 4
        add     rcx, qword ptr [rdi + 24]
        xor     eax, eax
        cmp     qword ptr [rdi + 40], rsi
        cmovbe  rcx, rax
        jbe     .LBB1_4
        cmp     qword ptr [rcx], 0
        je      .LBB1_2
        mov     rax, qword ptr [rcx + 8]
        shl     rax, 3
        add     rax, qword ptr [rdi]
.LBB1_4:
        ret
.LBB1_2:
        xor     eax, eax
        ret

What solution would you like?

Internal, pub(crate) get_unchecked(_mut) variants on SparseSet and derivative metadata stores and storages that is only used for release builds.

What alternative(s) have you considered?

Leaving this untouched. Eat the performance penalty.

james7132 avatar Jun 21 '22 10:06 james7132

A bit more digging, and it seems like the local Option<ThinSlicePtr<'_, T>> unwraps are just fine, they compile down to the same assembly, but anything that was fetched from a metadata store (i.e. Tables/SparseSets/etc) seems to be blocked off by the function boundary. It seems like it's inlining the function and then not doing further optimizations. This likely should go away with LTO or some other secondary optimization pass, but we probably shouldn't rely on that.

james7132 avatar Jun 21 '22 12:06 james7132

Upstreamed an issue for this https://github.com/rust-lang/rust/issues/98468

james7132 avatar Jun 24 '22 21:06 james7132

This seems to be fixed in the latest nightly: https://godbolt.org/z/9PzKWfY1K. Probably can close this when Rust 1.64 lands.

james7132 avatar Sep 12 '22 03:09 james7132

This seems to have been fixed as of 1.64's release.

james7132 avatar Oct 20 '22 07:10 james7132