stdarch
stdarch copied to clipboard
machine code for aarch64 `vcombine_` intrinsics maybe suboptimal
Clang implements the vcombine_
intrinsics using shufflevector
(https://github.com/llvm-mirror/clang/blob/master/test/CodeGen/aarch64-neon-vcombine.c). I've done the same (https://github.com/gnzlbg/stdsimd/commit/b2fdeda18b1fb4c8b7c8706f48e0d2637dc4966b#diff-2e4ef22de80cb67140d6b5ea99acea70R627) but instead of getting this (https://godbolt.org/g/TVw4Mq) or this (https://godbolt.org/g/xJTMHe):
vcombine_f64(__Float64x1_t, __Float64x1_t): // @test_vcombine_f64(__Float64x1_t, __Float64x1_t)
mov v0.d[1], v1.d[0]
ret
I'm getting something like this:
disassembly for coresimd::coresimd::aarch64::neon::assert_vcombine_f32_dup::vcombine_f32_shim:
0: adrp x8, e4000 <byte_str.j.llvm.1524587332910266792+0x2d0>
1: ldr x8, [x8, #4008]
2: adrp x9, 99000 <byte_str.L.llvm.6299433742659787578+0x23>
3: add x9, x9, #0x84b
4: mov w10, #0x28 // #40
5: mov v0.d[1], v1.d[0]
6: stp x9, x10, [x8]
7: ret
Is it inlined into vcombine_f32_shim
right? So isn't the rest of the code from that? What does the source code for vcombine_f32_shim
look like?
@parched the source is here: https://github.com/gnzlbg/stdsimd/blob/table_lookup/coresimd/aarch64/neon.rs#L626
Is it inlined into vcombine_f32_shim right? So isn't the rest of the code from that?
Might be. If they are inlined into the shim, the code should be that of the shim.
@gnzlbg I meant the actual shim function which I see is generated here. What does that expand to? Or, alternatively, what does the IR look like?
That would be this here: https://github.com/gnzlbg/stdsimd/blob/b699bef2cb285089f5f1f2c9f3305f6caab0833e/crates/assert-instr-macro/src/lib.rs#L111
It is basically a non-#[inline]
function that just hast the same arguments as the intrinsics, and calls it with them, and then returns the return type of the intrinsic.
Ah ok I see thanks, the other instructions look to just be the setting of _DONT_DEDUP
, ~although I'm not sure why it storing x10
too.~ oh that's just the length
rustc
generates the same instructions with clang
now.
https://rust.godbolt.org/z/xYKd97cW6
Great!