Alberto Cabrera Pérez
Alberto Cabrera Pérez
Here's the performance I got comparing to master using FP32: | model | device | test | t/s bb1681fb (5376) | t/s 53113d0a (5352) | speedup | |--------------------|---------------|--------|----------------------|----------------------|---------| | qwen2...
Results for FP16: So far results are roughly equivalent, even better in some cases. We have a couple of cases that may require looking a bit more into them, like...
> @Alcpz is this OK now? Thanks Yes. I agree with @joeatodd review. Accepting your changes, assuming that you will finalize addressing his suggestions. Sorry for missing this.
I had a flaky test a day ago for `level_zero` as well, but it was a different test. So far I haven't been able to reproduce the issue. We will...
Tests disabled for now. (#14855 merged)
I've been able to reproduce the issue, but only on Windows. Linux seems unaffected. As reported, `GGML_SYCL_DISABLE_OPT=1` works without problem. There seems to be something wrong with the reorder, but...
@OuadiElfarouki https://github.com/ggerganov/llama.cpp/pull/9707 got merged :tada: