Andrei Hutu

Results 5 issues of Andrei Hutu

This commit adds fp16 test cases to the conv2d NHWC TOPI schedules for `arm_cpu`. Following the example of #8529, the numpy reference conv2d output is computed in fp32 instead of...

This commit adds a scalable `arm_cpu` conv2d NHWC schedule for fp32 which generates SME instructions by using the tensor intrinsics introduced in #16921. Alongside the SME schedule, the logic of...

This commit introduces rewrite rules for indices which can arise from splitting axes by scalable factors (e.g. `xo, xi = sch.split(x, factors = [None, 8 * T.vscale()])`): ``` (v_x_o *...

This commit extends the SME conv2d NHWC schedule to support convolutions with float16 inputs (data and kernel) and a float32 output using the tensor intrinsics added in #16981. cc @ekalda...

# Description This commit introduces an f32 ASIMD `softmax` JIT implementation using the `exp` eltwise injector added in #4376, while also improving performance for the existing `sve_*` implementations (primarily by...

platform:cpu-aarch64
component:common