Luo Yu
Luo Yu
## Type of Change Add new weight_dtype: int5 and int6 Support model quantization of int5 and int6
### Describe the bug I'm debugging the SYCL backend of [llama.cpp](https://github.com/ggerganov/llama.cpp). I found some kernel output `-nan` when built with Debug. The root cause is that ```cpp sycl::select_from_group(g, x, target_offset...
## Type of Change update the SYCL performance. ```shell llama2-7b int4, sym, g128, comp_dtype=fp32, scale_dtype=fp32, KV_dtype=fp32 Max1100: 8.6ms/token A770: 14.5ms/token A770m: 15.8ms/token A750: 15.4ms/token 155H: 51.6ms/token ``` ```shell cmake .....