Wei (Neil) Su
Wei (Neil) Su
Summary: Add auto-vectorization implementation for int8-CPU-TBE API Differential Revision: D54286969
Summary: Increase prefetching and reduce backend stall as is suggested by NVIDIA Differential Revision: D53552699
Summary: Try auto-vectorize CPU TBE-NBit reference implementation code Differential Revision: D50142928
Summary: Fix unused variable in github CI Differential Revision: D56366784
Summary: Add CPU sequential TBE for int4 weight type and int4 output type Differential Revision: D60242110