wasmtime Cranelift: simplify opcode set by removing bitwise operations on floats

👋 Hey,

Jun Ryung Ju (sorry, I don't know the github user) on zulip discovered that bitwise operations on floats are unimplemented on AArch64. Upon further investigation they seem to be unimplemented on all backends.

@bjorn3 also mentions that cg_clif does not use them, it does a bitcast and then the operation on the integer version.

This makes it likely a good candidate for simplifying our opcode set.

Is there any benefit / optimization that we can do on these that would justify keeping them around?

The bitwise ops that I'm referring to are:

band
bor
bxor
bnot
band_not
bor_not
bxor_not

cc: @cfallin

Sep 05 '22 13:09 afonso360

+1 remove.

Sep 06 '22 09:09 yuyang-ok

Perhaps optimizations like #4803 would do better if we keep bitwise CLIF operations on floats—we could express the optimization in the mid-end without introducing extra bitcast instructions.

That said, looking more closely, I think the best code sequence for that particular optimization is different enough across backends that we may not want to do it in the mid-end anyway. And it's niche enough to be a really weak argument for keeping bitwise operators on floats.

By contrast, I find the opposite argument compelling: they're complicated to lower on probably any architecture with a hardware floating-point unit, and nobody has wanted them yet.

So I'm also in favor of removing them.

Sep 06 '22 19:09 jameysharp

@afonso360 You are missing band_not in your list.

Those operations are actually straightforward to implement on AArch64, since there is a hardware floating-point unit iff SIMD is supported; only bxor_not is a 2-instruction sequence (instead of 1). However, I probably also lean towards removing them, given the niche use case.

Sep 07 '22 10:09 akirilov-arm

Added, Thanks!

Sep 07 '22 11:09 afonso360

Agreed, I think we should remove them. A floating-point type imbues some meaning on the value such that it's not meant to be used (or is not typically used) as a "bucket of bits" type, as integer types are.

I suspect that it would be possible to do reasonable lowerings for these on x86-64 too (since we use XMM regs and would have the integer-vec instructions at our disposal, just as on aarch64), so that angle doesn't hold as much significance for me. However, the "only build what is needed" angle does: if no one actually needs these op/type combinations and they are extra lowerings to add, maintain, and test, then let's not have them IMHO.

Sep 07 '22 19:09 cfallin

It turns out that Wasmtime can actually generate SIMD bitwise ops on floating-point vectors, as in this example (from tests/misc_testsuite/simd/issue_3327_bnot_lowering.wast and #3327):

(module
  (func $v128_not (export "v128_not") (result v128)
    v128.const f32x4 0 0 0 0
    f32x4.abs
    v128.not)
)

(assert_return (invoke "v128_not") (v128.const i32x4 -1 -1 -1 -1))

That currently compiles to this CLIF:

function u0:0(i64 vmctx) -> i8x16 fast {
    const0 = 0x00000000000000000000000000000000

                                block0(v0: i64):
@0026                               v2 = vconst.i8x16 const0
@0038                               v3 = raw_bitcast.f32x4 v2  ; v2 = const0
@0038                               v4 = fabs v3
@003b                               v5 = bnot v4
@003d                               v6 = raw_bitcast.i8x16 v5
@003d                               jump block1(v6)

                                block1(v1: i8x16):
@003d                               return v1
}

Do we want cranelift-wasm to insert a bitcast in between the fabs and bnot here?

Sep 07 '22 21:09 jameysharp

I would prefer fewer raw_bitcasts if at all possible? I'm not very fond of it...

Sep 07 '22 23:09 abrown

Ah, interesting! That actually does change the calculus a bit in my mind. Given the principle above (only build what is needed) as the higher-priority one, I think it's totally fine to keep this support. Then for consistency it probably makes sense to support scalar floats as well; otherwise we have support for scalar ints, and all kinds of vectors, but not scalar floats, which is an odd and surprising hole.

Sep 07 '22 23:09 cfallin

I'm closing this issue as we seem to have reached consensus on keeping bitwise operations on floats.

Oct 06 '22 16:10 jameysharp

Ok. I am working on implementing float bitops on x86_64 currently. looks implementing bitwise operation using bitcasts results terrible.

// function %band_f32(f32, f32) -> f32 {
// block0(v0: f32, v1: f32):
//     v2 = band v0, v1
//     return v2
// }

VCode {
  Entry block: 0
  v130 := v135
Block 0:
    (original IR block: block0)
    (instruction range: 0 .. 8)
  Inst 0: args %v128=%xmm0 %v129=%xmm1
  Inst 1: movd    %v128, %v132l
  Inst 2: movd    %v129, %v133l
  Inst 3: andl    %v132l, %v133l, %v134l
  Inst 4: movd    %v134l, %v135
  Inst 5: movaps  %v130, %v131
  Inst 6: movaps  %v131, %xmm0
  Inst 7: ret
}

~~I think we need to make F32 and F64 to select Gpr instead of Xmm registers.~~ ~~Is this possible or should I make an optimization for scalar float memory ops?~~

Ok nvm. I think I just need lower scalar bitops to XMM bitops

Oct 09 '22 22:10 ArtBlnd