fbarchard

Results 85 comments of fbarchard

For Hexagon I implemented f32-vrndne using the Neon method, which is roundf() style. Its apparently not the intended rounding.. it should be nearbyintf(), but the method would be similar for...

Usually the reads of a C8 kernel out perform a C4 kernel, so I'm suspecting you're spilling registers? Consider c4s2 which is 4 element dot products with rotate of 4...

SME is mainly interesting for its matrix multiply, which is GEMM and IGEMM for all datatypes. Hardware doesnt really exist yet, except maybe Apple M4, so when macOS supports SME...

Needs another reviewer to approve

> @rrwinterton Can you resolve the clang format errors and rebase? I can merge once checks are green. The CMake failures are caused by CMake 4.x dropping support for older...