gemm issues

Compilation error when compiling to aarch64-apple-ios

2

Hi. I am trying to compile a project that uses the Candle ML framework into aarch64-apple-ios. Candle uses gemm as a dependency, so I get the following compilation error: ```...

santiagomed

Provide benchmark with throughput units (GFlops/s TFlops/s)

1

Hello fellow gemm optimizer enthusiast, It would be extremely useful to provide benchmark utilities, ideally in GFlop/s TFlop/s to compare with other frameworks, compare with the CPU peak theoretical throughput...

mratsim

Low Level API with Pre Allocated Work Space Exposed

It would be great if there will be, for any function, a low level API which exposes the needed workspace to avoid any allocations of the function. The work is...

RoyiAvital

Candle example uses 10% of CPU when fma is active for x86

7

Coming here after noticing that CPU inference in the llama example over at candle only utilizes 10% of my CPU (AMD Ryzen 5800X3D). As I mentioned over at the candle...

kstavro

Add GemmType trait for dispatching gemm fn calls

2

I was wondering if this change would be of interest. Removes the if-else-panic and let's the type system take care of that constraint instead.

ivarflakstad

[DUMMY] F16 lane

Narsil

Fixing large multi-threading by chunking on fewer threads.

Narsil

This improves drastically overthreading issue (>48cores)

2

I'm not sure that this change is optimal by any means. But it does yield a significant improvement when running relatively small matmul over a 48 core machine. Before: ```...

Narsil

Adding SIMD128 for wasm.

F32 and F64. F32 was tested. Weirdly, the F64 simd code is flagged as deadcode but it is actually operational. Should fix : https://github.com/sarah-ek/gemm/issues/3 It's indeed compile time simd detection...

Narsil

M1 f16 intrinsics

Hey Opening an issue instead of a PR for this one because it's super dirty work atm: Basically on neon aarch64 (M1 Mac) we can add pure f16 intrinsics and...

Narsil

gemm
gemm copied to clipboard

Metadata

Compilation error when compiling to aarch64-apple-ios

Provide benchmark with throughput units (GFlops/s TFlops/s)

Low Level API with Pre Allocated Work Space Exposed

Candle example uses 10% of CPU when fma is active for x86

Add GemmType trait for dispatching gemm fn calls

[DUMMY] F16 lane

Fixing large multi-threading by chunking on fewer threads.

This improves drastically overthreading issue (>48cores)

Adding SIMD128 for wasm.

M1 f16 intrinsics

← Metadata

Owner

Metadata

gemm gemm copied to clipboard

Metadata

← Metadata

Owner

Metadata

gemm
gemm copied to clipboard