half-rs icon indicating copy to clipboard operation
half-rs copied to clipboard

[Tentative] Adding new intrinsics for gemm.

Open Narsil opened this issue 1 year ago • 1 comments

Hi here.

I am attempting to port basically ggml matrix multiplication into a standalone crate: https://github.com/Narsil/ggblas

For most of the operations, I was able to leverage intrinsics: https://doc.rust-lang.org/core/arch/arm/index.html However for M1 (so arm aarch64), it's missing some SIMD f16 intrinsics.

https://developer.arm.com/documentation/101028/0012/13--Advanced-SIMD--Neon--intrinsics

Not sure if the approach I suggest here is viable, my understanding of low level primitives such as these is fairly limited.

Happy to run a more complete set of operations if this is indeed deemed interesting.

Seems the proper implementation into the compiler itself would be something like : https://github.com/rust-lang/stdarch/issues/344

That's why I felt the intrinsics would have their place here.

Cheers !

Other refS: https://github.com/rust-lang/rfcs/pull/3451

Narsil avatar Jul 06 '23 20:07 Narsil

I'm fine with putting these in the crate, maybe make sure that existing aarch64 assembly in crate doesn't overlap though, and make any existing code use the new names if there is any overlap.

However, I don't want to publicly expose the binary16 module, that's an internal structural implementation detail. Perhaps just expose these at half::arch::aarch64?

starkat99 avatar Aug 05 '23 00:08 starkat99