gemm
gemm copied to clipboard
Hi. I am trying to compile a project that uses the Candle ML framework into aarch64-apple-ios. Candle uses gemm as a dependency, so I get the following compilation error: ```...
Hello fellow gemm optimizer enthusiast, It would be extremely useful to provide benchmark utilities, ideally in GFlop/s TFlop/s to compare with other frameworks, compare with the CPU peak theoretical throughput...
It would be great if there will be, for any function, a low level API which exposes the needed workspace to avoid any allocations of the function. The work is...
Coming here after noticing that CPU inference in the llama example over at candle only utilizes 10% of my CPU (AMD Ryzen 5800X3D). As I mentioned over at the candle...
I was wondering if this change would be of interest. Removes the if-else-panic and let's the type system take care of that constraint instead.
I'm not sure that this change is optimal by any means. But it does yield a significant improvement when running relatively small matmul over a 48 core machine. Before: ```...
F32 and F64. F32 was tested. Weirdly, the F64 simd code is flagged as deadcode but it is actually operational. Should fix : https://github.com/sarah-ek/gemm/issues/3 It's indeed compile time simd detection...
Hey Opening an issue instead of a PR for this one because it's super dirty work atm: Basically on neon aarch64 (M1 Mac) we can add pure f16 intrinsics and...