Fomenko
Fomenko
Please provide more Information about your Hardware and Software Versions. And which Model version are you using.
> CUDA isn't deterministic unl You need to provide more Information, like Model, the Temperature rate, the Quantization type and so on. In my Opinion the Quantization is the Problem.
> make LLAMA_HIPBLAS=1 -j4 ``` (base) user@myusb:~/llama.cpp$ make LLAMA_HIPBLAS=1 -j4 I llama.cpp build info: I UNAME_S: Linux I UNAME_P: x86_64 I UNAME_M: x86_64 I CFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG...
ok I compiled it and now when I run: ``` (base) user@myusb:~/llama.cpp$ ./main -i -m ~/mistral-7b-instruct-v0.2.Q8_0.gguf -ngl 999 Log start main: build = 2047 (b05102fe) main: built with cc (Ubuntu...
``` (base) user@myusb:~/llama.cpp$ rocm-smi ======================================== ROCm System Management Interface ======================================== ================================================== Concise Info ================================================== Device [Model : Revision] Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% Name (20...
### Transformers **Here is Transformers, it is working properly if you Chat**  **And this is the behavior if Transformers if you don't Chat, only load the...
> I can't reproduce with just `main -i -m foo.gguf -ngl 999`. Can you? Can you please show your GPU behavior in the 5 states. And tell your Hardware, Software...
> llama.cpp, and more specifically the CUDA backend, is single-threaded. While waiting for an user input, there is no other code running, there is no work being submitted to the...
> It really isn't. I suggest that you take a look at AMD's debugging tools to try to understand what the GPU is doing while it should be idle, assuming...
> ```shell > export GPU_MAX_HW_QUEUES=1 > ``` Thank you, the setting of the Variable solved my Problem. Why can't llama.cpp setting this internally? And how solved Transforms this Problem is...