fbarchard

Results 83 comments of fbarchard

For Hexagon I implemented f32-vrndne using the Neon method, which is roundf() style. Its apparently not the intended rounding.. it should be nearbyintf(), but the method would be similar for...

Usually the reads of a C8 kernel out perform a C4 kernel, so I'm suspecting you're spilling registers? Consider c4s2 which is 4 element dot products with rotate of 4...

SME is mainly interesting for its matrix multiply, which is GEMM and IGEMM for all datatypes. Hardware doesnt really exist yet, except maybe Apple M4, so when macOS supports SME...