Alexander Heinecke

Results 18 issues of Alexander Heinecke

Right now we the xgemm driver has only 2 flags for formats (a trans and b trans). However with latest additions we have various VNNI factors and formats. We therefore...

The unary GeLU TPP uses vblendvps when generating code for AVX512 VL=256. This can lead to hard to debug problems as only the first 16 register are valid for use....

Following up #741 (which fixes some compute datatype CI testing), right now the use of bf16 as a compute datatype (mainly as there is no hardware support) is very spotty....

Has some figured out what is needed (and if possible at all) to support the D1100 of McIntosh? Playback right happens via PCM-convert through ffmpeg to 32bit/352.h kHz :-( Thanks!

The Intel AMX TileConfig Hoisting uses today the AllocaOp for the tileconfig state as anchor op and then attempts to move IntelAMXTileConfigDispatchOp around based on a test if the AllocOp...

As discussed in #871 we are running out of flags in the GEMM descriptor and we are kind of wasteful with the usage of one flag for special precision knobs....

PR #893 unveiled a very tricky out-of-bound access when running norm-to-normT TPP for 16bit. Due to the use of m128 / m256 broadcasts there were out-of-bound access in the remainder...