Meng, Hengyu
Meng, Hengyu
@shibe2 I agree it is more of a feature enhancement. I think it will be quite useful if llama.cpp can calculate the appropriate n_ctx especially for serving, any plans on...
@luoyu-intel
@ClarkChin08 can you attach the measurements results? like llama3-70B on 8 GPUs, memory consumption on each GPU, performance?
can you reproduce the issue via ```llama-cli```?
can you reproduce it by UT? for example ``` ./bin/test-backend-ops -b SYCL0 ```
I do some searching and found the related issues on intel llvm repo @MrSidims I saw the similar issues https://github.com/intel/llvm/pull/4025#issuecomment-870823000, could you give us some education?
ok, can you run the ```tanh``` alone and see whether it will crash each time? ``` .\bin\test-backend-ops -b SYCL0 -o TANH ```
It seems there might be a more fundamental issue causing this. As a temporary solution, could you please try updating your driver, operating system, kernel, and oneAPI? This might address...
hi @MrSidims thank you for your quick reply. I can confirm no AOT option set currently. the whole compilation command is the following: https://github.com/ggerganov/llama.cpp/blob/0d2c7321e9678e91b760ebe57f0d063856bb018b/ggml/src/CMakeLists.txt#L465-L518 >If this compilation happens in JIT...