Engininja2

Results 17 comments of Engininja2

`nodes`, `paramsDriver`, and `paramsRuntime` are being used across multiple calls of the function but their data is only loaded in an earlier call. Should they be static?

The text models seem to just be llama so you can use convert.py for those

There are a few things I can think of that could be slowing you down. First is `LLAMA_HIP_UMA=1` is for integrated graphics in the CPU, and will slow down actual...

Windows doesn't support `HSA_OVERRIDE_GFX_VERSION` and probably doesn't have its own equivalent. You would need to compile a Tensile library for gfx1103 for rocBLAS 5.7, or use Linux.

Unlike RDNA2 where everything is more or less gfx1030 RDNA3 ISAs have significant differences. In the linked comment '(more than "-ngl 32" resulted in gibberish)'. You could try offloading 1...

The RX 560 may be slower in part because it's using the fallback code for `__dp4a()` and its isa lacks a corresponding opcode and the compiler may not be choosing...

Could this be from newlines in your shell? You might be running `./main -m /models/Meta-Llama-3-70B-Instruct.Q4_K_M.gguf -r ''` and then separately trying to run `--in-prefix "\nuser\n\n"` and so on.

`__shfl_xor()` for half2 was added in ROCm 5.6. You could install the newer HIP SDK version 5.7 and use that instead, or try this PR: #7263

You can set them as environment variables before running cmake, or you can pass them as arguments. ```cmd cmake -B build -G "Ninja" -DCMAKE_C_COMPILER=clang.exe -DCMAKE_CXX_COMPILER=clang++.exe -DLLAMA_HIPBLAS=ON -DCMAKE_BUILD_TYPE=Release ``` If the...

After trying it, even if you build llama.cpp on Windows without the HIP SDK bin folder in your path (C:\Program Files\AMD\ROCm\5.5\bin\) the resulting executables won't run because they can't find...