Can RVV Kernels be enable by default?
I find there are several RVV kernels in this project. However, they aren't enabled by default. E.g: XNNPACK has RVV H-swish here, but it isn't enabled by default https://github.com/google/XNNPACK/blob/9c65f03020c7fe960314d96fd33b1bbda24361b8/src/configs/unary-elementwise-config.c#L818-L821
Is there any reason or side effect preventing us from enabling them by default?
To configure microkernels we need to run benchmarks on RVV hardware. The kernels support m1, m2, m4 and m8 and normally we'd run the benchmark, select the fastest and plug that into the config. There is a microkernel benchmark, which shows the kernels that could be used f32_vhswish/rvv_u1v/N:3840/real_time ERROR OCCURRED: 'no V extension' f32_vhswish/rvv_u1v/N:32640/real_time ERROR OCCURRED: 'no V extension' f32_vhswish/rvv_u2v/N:3840/real_time ERROR OCCURRED: 'no V extension' f32_vhswish/rvv_u2v/N:32640/real_time ERROR OCCURRED: 'no V extension' f32_vhswish/rvv_u4v/N:3840/real_time ERROR OCCURRED: 'no V extension' f32_vhswish/rvv_u4v/N:32640/real_time ERROR OCCURRED: 'no V extension' f32_vhswish/rvv_u8v/N:3840/real_time ERROR OCCURRED: 'no V extension' f32_vhswish/rvv_u8v/N:32640/real_time ERROR OCCURRED: 'no V extension' f32_vhswish/scalar_u1/N:3840/real_time 322413 ns 322405 ns 2174 bytes=95.2816M/s f32_vhswish/scalar_u1/N:32640/real_time 2736093 ns 2736025 ns 256 bytes=95.4354M/s f32_vhswish/scalar_u2/N:3840/real_time 267837 ns 267827 ns 2611 bytes=114.696M/s f32_vhswish/scalar_u2/N:32640/real_time 2278400 ns 2278154 ns 308 bytes=114.607M/s f32_vhswish/scalar_u4/N:3840/real_time 289920 ns 289905 ns 2413 bytes=105.96M/s f32_vhswish/scalar_u4/N:32640/real_time 2479500 ns 2479095 ns 284 bytes=105.312M/s
To test they work, I manually hack the cpu_info RVV detect and run on qemu. We prefer properly test on real hardware before enabling, but if you want to help with that, contributions welcome