fbarchard
fbarchard
To configure microkernels we need to run benchmarks on RVV hardware. The kernels support m1, m2, m4 and m8 and normally we'd run the benchmark, select the fastest and plug...
By linear do you mean QD8 GEMM with linear instead of minmax?
I ran into the same issue on Intel I think, and hacked a solution by doing the syscall in assembly. ``` #if XNN_ARCH_X86_64 && defined(__linux__) ssize_t xnn_syscall(size_t rax, size_t rdi,...
I am, but time frame is roughly by end of year. I plan to focus on a full set of fp32 microkernels first.
8 bit (or 4 bit) weights can cause an alignment issue for bias and scale that are 32 bit elements and usually vectors. dwconv is an igemm. igemm is a...
If the multipass specifically has the issue but single pass works, its likely the temporary accumulation buffer is not int32 aligned.
armsimd32 is ARMv6 style simd - 4 bytes. It provides optimization on cpus without NEON. In bazel there is a section with the build options applied: ``` xnnpack_cc_library( name =...
There is an armv7 script for android. When I tried it with NDK 21 it had a build error against I8MM due to an old version of clang being used,...
The end2end_bench shows spmm on arm using threads.
the build system determines which kernels to build. the macros reflect what was enabled and wont test/use the disabled kernels. with bazel there are flags to control each instruction set:...