Dillon
Dillon
I think this should be fixed now.
We have this SME2 kernel already: https://github.com/google/XNNPACK/blob/master/src/pf32-gemm/pf32-gemm-32x32-minmax-neonsme2.c If the only difference is multi-vector load/store instructions, we'd rather avoid having two almost identical kernels coming from two very different sources with...
@vgundlur can you please address the feedback I raised in this comment? > We have this SME2 kernel already: https://github.com/google/XNNPACK/blob/master/src/pf32-gemm/pf32-gemm-32x32-minmax-neonsme2.c > > If the only difference is multi-vector load/store instructions,...
This is failing tests in the same way that was fixed for SME2 kernels by this change: https://github.com/google/XNNPACK/pull/8750/files#diff-e7b59d3d757d18b4fff21fe85867107abe17cf9b324f474b79fbb39ca380ec6c I think you just need to adjust that new logic in generate-gemm-test.py...
LiveSPICE currently only supports importing basic component models. I'm guessing that in this case, the thing you are looking for is a `.subckt` directive, which isn't supported. It might be...
I'm wondering if the problem here is that we aren't making our libraries private or something? Maybe they should be in a subfolder? It seems like this kind of collision...
I tried various things that seem like they should work, but CMake seems to foil anything that might be considered any kind of namespace. We should just do this renaming.
This is broken for us with many errors of the form: ``` src/qb4-packw/gen/qb4-packw-x16c4-gemm-goi-scalar.c:23:38: error: cast from 'const unsigned char *' to 'unsigned int *' drops const qualifier [-Werror,-Wcast-qual] 23 |...
This is still not building: ``` XNNPACK/src/qb4-packw/gen/qb4-packw-x16c8-gemm-goi-aarch64-neondot.c:71:19: error: unused variable 'veor_mask' [-Werror,-Wunused-variable] 71 | const int8x16_t veor_mask = vmovq_n_s8(UINT8_C(0x88)); | ^~~~~~~~~ XNNPACK/src/qb4-packw/gen/qb4-packw-x16c8-gemm-goi-aarch64-neondot.c:72:19: error: unused variable 'neg_zp' [-Werror,-Wunused-variable] 72 | const...
This is a unary op, where we should be generating a LUT. I'm not sure that this implementation is going to be that much better. It's vectorized, but it needs...