Meng, Hengyu comments

Results 61 comments of


                                            Meng, Hengyu

Next Ops to work on

Hi, thank you for your excellent work! I am quite new to MLIR so the questions may be stupid, please never mind. I see ```ArgMax``` is supported in onnx-mlir but...

Detailed information about sparsity algorithm?

Hi wenjingk, we prefer tile-wise sparsity in INC, which is balance between accuracy and performance. Tile-wise sparsity means to divide the whole matrix into tiles, if there are non-zero elements...

Excessively slow prompt processing time with 70B partially offloaded in SYCL

hi @Jacoby1218 could you provide some reference data to show the magnitude of gaps? for example, performance on RTX-4070ti (16 GB), or entirely on iGPU/CPU?

Excessively slow prompt processing time with 70B partially offloaded in SYCL

I think this maybe due to lacking optimization on multi-batch, has been recordd in https://github.com/ggerganov/llama.cpp/discussions/5277, please stay tuned!

Excessively slow prompt processing time with 70B partially offloaded in SYCL

I think this has been improved with https://github.com/ggerganov/llama.cpp/pull/6217, please give a try.

[SYCL] Add support for soft_max ALiBi

Not sure whether it is expected GPU MAX1100 passed all but on MTL iGPU, f16 off, win11, Intel(R) oneAPI DPC++/C++ Compiler 2024.0.2 (2024.0.2.20231213) ```powershell C:\Users\gta\Documents\llama.cpp\build>bin\test-backend-ops.exe test -b SYCL0 -o SOFT_MAX...

Meng, Hengyu

Next Ops to work on

Detailed information about sparsity algorithm?

Excessively slow prompt processing time with 70B partially offloaded in SYCL

Excessively slow prompt processing time with 70B partially offloaded in SYCL

Excessively slow prompt processing time with 70B partially offloaded in SYCL

[SYCL] Add support for soft_max ALiBi

[SYCL] Add support for soft_max ALiBi

[SYCL] GGML_ASSERT issue when running llama.cpp with SYCL on A770

[SYCL] GGML_ASSERT issue when running llama.cpp with SYCL on A770

[SYCL] GGML_ASSERT issue when running llama.cpp with SYCL on A770