Diego Devesa
Diego Devesa
I would prefer if the scaling factors were exported as a tensor rather than metadata, it would remove quite a bit of code and it would be more efficient.
I think this is going to cause the rope to be run on the CPU always, because the scheduler prefers running ops that use weights in the backend of the...
Looking at the graphs, it seems that the load time increased, but the throughput looks similar. Maybe it was a fluke? I can't reproduce it on my system either. |...
Since this requires cmake 3.21, I think it would be good to add a `cmake_minimum_required(VERSION 3.21)` to the HIP section, similar to how the CUDA section requires version 3.17.
I was hoping that with these changes you wouldn't need to pass additional parameters to `cmake` to workaround the issue, beyond enabling HIP. It seems very suspect that `cmake` thinks...
Can you update the build instructions for HIP in the `README`?
I think it is good (and important) to follow AMD's recommendations for building. If Fedora wants to deviate from this process, they should deal with the problems that it causes.
> Isn't AMD's recommendation exactly this - to build everything exclusively with their llvm version or am I mistaken? I don't see that in the documentation linked here (https://rocm.docs.amd.com/en/docs-6.0.0/conceptual/cmake-packages.html#using-hip-in-cmake). It...
As already mentioned, you can have this functionality with the RPC backend, which is already merged. Check https://github.com/ggerganov/llama.cpp/pull/6829 and https://github.com/ggerganov/llama.cpp/tree/master/examples/rpc for more details.
Looks good, it would be nice to have other parameters in the matrix such as different values of `-ngl`, but that's not important right now.