Alexander Heinecke
Alexander Heinecke
In order to support AWS Graviton3 (C7G) instances this PR aims at extending A64FX to SVE256
when running the tpv16 example with 256 processes, fault-receiver 4 is written by process 240 and 242: -rw-rw-r-- 1 aheineck aheineck 37824 Oct 19 11:06 tpv16-faultreceiver-00004-00240.dat -rw-rw-r-- 1 aheineck aheineck...
Adding AVX512_F16 support all TPPs
This is early work in progress for now, needs several refactor and additions before it can be merged, e.g. all FIX-ELTW-SSE comments need to be fixed etc.
#834 merges initial support for select & blend TPPs wrt. to the datatypes that can be used as selector. In future we should add support for various width of integer...
Today our CI infrastructure is focused on testing the normal/positive case, if it tests at all :-), in the entire eltwise,equation and GEMM TPPs testing world. Scripts have been written...
we have support for implicit datatype. Some of the memory movement TPP should be able to leverage this/use this.
With the release of SPR the AMX ABI was slightly revised: tileconfig is neither callee-save nor caller-save. It has to be saved in the call stack, when it's changed. Right...
For various LLM and GNN operators, we need vector a fast dot. Right now we have only very slow A^T GEMM for M=1 or we have to run a sequence...
#794 has shown we have some holes in the testing infra, that we should fix by running all tests with not only pure f16 and b16 inputs, but also their...