Diego Devesa comments

Results 361 comments of


                                            Diego Devesa

Add phi3 128K model support

I would prefer if the scaling factors were exported as a tensor rather than metadata, it would remove quite a bit of code and it would be more efficient.

Add phi3 128K model support

I think this is going to cause the rope to be run on the CPU always, because the scheduler prefers running ops that use weights in the backend of the...

Add phi3 128K model support

Looking at the graphs, it seems that the load time increased, but the throughput looks similar. Maybe it was a fluke? I can't reproduce it on my system either. |...

ROCm: use native CMake HIP support

Since this requires cmake 3.21, I think it would be good to add a `cmake_minimum_required(VERSION 3.21)` to the HIP section, similar to how the CUDA section requires version 3.17.

ROCm: use native CMake HIP support

I was hoping that with these changes you wouldn't need to pass additional parameters to `cmake` to workaround the issue, beyond enabling HIP. It seems very suspect that `cmake` thinks...

ROCm: use native CMake HIP support

Can you update the build instructions for HIP in the `README`?

ROCm: use native CMake HIP support

I think it is good (and important) to follow AMD's recommendations for building. If Fedora wants to deviate from this process, they should deal with the problems that it causes.

ROCm: use native CMake HIP support

> Isn't AMD's recommendation exactly this - to build everything exclusively with their llvm version or am I mistaken? I don't see that in the documentation linked here (https://rocm.docs.amd.com/en/docs-6.0.0/conceptual/cmake-packages.html#using-hip-in-cmake). It...

MPI issue on raspberry pi cluster

As already mentioned, you can have this functionality with the RPC backend, which is already merged. Check https://github.com/ggerganov/llama.cpp/pull/6829 and https://github.com/ggerganov/llama.cpp/tree/master/examples/rpc for more details.

server: bench: continuous performance testing

Looks good, it would be nice to have other parameters in the matrix such as different values of `-ngl`, but that's not important right now.