fbarchard comments

Results 83 comments of


                                            fbarchard

unsupported instruction `vpdpbusd'

Thats an odd one. Your compiler must be new enough to accept -mavxvvni, and it seems it got past the compile and failed during link? Thats a new one... its...

unsupported instruction `vpdpbusd'

Issue #5892 is local compilers often only support cpus for the host they are on. I this case the cuda linker appears to not support vnni. If cuda were built...

Add QS8_QC8W GEMM/IGEMM microkernels for Wasm Relaxed Unsigned and Signed …

There is a merge conflict for the internal review. Can you rebase and/or break into smaller PR

Enable RVV GEMM/IGEMM 7 x m4 in operator config

nr 2 is an MRx2 GEMM - 2 floats wide. On SSE and NEON that normally use 4 floats per vector it allows a faster GEMM. But it is optional......

Enable RVV GEMM/IGEMM 7 x m4 in operator config

Re nr2 - if you didnt have such huge vectors you wouldnt have this problem :-) nr2 doesnt come up much, and you dont have to specialize for it, especially...

Enable RVV GEMM/IGEMM 7 x m4 in operator config

Enable RVV GEMM/IGEMM 7 x m4 is landed in https://github.com/google/XNNPACK/pull/7035 you can close this PR and if add an nr2 enable as followup

4x16s4 fp32-gemm kernel have better performance than default(5x16) kernel for meteor lake

Note that this is due to Visual C register spill. clang produces better code with 5x16

QB4W AVX2 GEMM Kernels

Re YMM's - yes I tested that too, and infact the old code for qs8 8 bit output on avx and avx512 used to combine all the bytes and do...

Failed to compile XNNPACK on WoA(Windows on ARM) device.

Hi thanks for the report. When I give a quick try with blaze which is like bazel, I'm able to build the abs bench blaze build --config=lexan_x86_64 -c opt //third_party/XNNPACK/bench:abs_bench...

Failed to compile XNNPACK on WoA(Windows on ARM) device.

The arm assembly is in .S files meant to be compiled with gcc or clang. As far as I know theres no way to assemble them with Visual Studio. The...