fbarchard

Results 83 comments of fbarchard

Thats an odd one. Your compiler must be new enough to accept -mavxvvni, and it seems it got past the compile and failed during link? Thats a new one... its...

Issue #5892 is local compilers often only support cpus for the host they are on. I this case the cuda linker appears to not support vnni. If cuda were built...

There is a merge conflict for the internal review. Can you rebase and/or break into smaller PR

nr 2 is an MRx2 GEMM - 2 floats wide. On SSE and NEON that normally use 4 floats per vector it allows a faster GEMM. But it is optional......

Re nr2 - if you didnt have such huge vectors you wouldnt have this problem :-) nr2 doesnt come up much, and you dont have to specialize for it, especially...

Enable RVV GEMM/IGEMM 7 x m4 is landed in https://github.com/google/XNNPACK/pull/7035 you can close this PR and if add an nr2 enable as followup

Note that this is due to Visual C register spill. clang produces better code with 5x16

Re YMM's - yes I tested that too, and infact the old code for qs8 8 bit output on avx and avx512 used to combine all the bytes and do...

Hi thanks for the report. When I give a quick try with blaze which is like bazel, I'm able to build the abs bench blaze build --config=lexan_x86_64 -c opt //third_party/XNNPACK/bench:abs_bench...

The arm assembly is in .S files meant to be compiled with gcc or clang. As far as I know theres no way to assemble them with Visual Studio. The...