half-rs
half-rs copied to clipboard
[Tentative] Adding new intrinsics for gemm.
Hi here.
I am attempting to port basically ggml matrix multiplication into a standalone crate: https://github.com/Narsil/ggblas
For most of the operations, I was able to leverage intrinsics: https://doc.rust-lang.org/core/arch/arm/index.html However for M1 (so arm aarch64), it's missing some SIMD f16 intrinsics.
https://developer.arm.com/documentation/101028/0012/13--Advanced-SIMD--Neon--intrinsics
Not sure if the approach I suggest here is viable, my understanding of low level primitives such as these is fairly limited.
Happy to run a more complete set of operations if this is indeed deemed interesting.
Seems the proper implementation into the compiler itself would be something like : https://github.com/rust-lang/stdarch/issues/344
That's why I felt the intrinsics would have their place here.
Cheers !
Other refS: https://github.com/rust-lang/rfcs/pull/3451
I'm fine with putting these in the crate, maybe make sure that existing aarch64 assembly in crate doesn't overlap though, and make any existing code use the new names if there is any overlap.
However, I don't want to publicly expose the binary16
module, that's an internal structural implementation detail. Perhaps just expose these at half::arch::aarch64
?