Johannes Gäßler comments

Results 235 comments of


                                            Johannes Gäßler

DRAFT: Introduction of CUDA Graphs to LLama.cpp

`./llama-bench` and `./perplexity` are broken with this PR which is why I'm using `./main` for testing. Results for LLaMA 2 7b q4_0: | GPU | Test | t/s master |...

DRAFT: Introduction of CUDA Graphs to LLama.cpp

Some single GPU test results: | GPU | Model | Test | t/s master | t/s PR | Speedup | |-------------|-----------------|-------|------------|--------|---------| | 1x RTX 4090 | LLaMA 3 8b f16...

DRAFT: Introduction of CUDA Graphs to LLama.cpp

I just noticed that instead of using Github's built-in draft feature you added "DRAFT:" to the title. Please let me know when you think the PR is ready for review,...

DRAFT: Introduction of CUDA Graphs to LLama.cpp

How about this: in terms of performance I think it would make sense to double buffer ggml graph creation. So essentially start creating the next ggml graph while the previous...

Fixed WSL cuda's OOM error

I just opened a large PR for multi GPU support: https://github.com/ggerganov/llama.cpp/pull/1607

docker: add support for CUDA in docker

I didn't merge this PR because I wanted someone else to check it as well; as I said, I'm not very knowledgeable about Docker.

Is it normal that ROCm+HIPBLAS produces different results than on CPU or breaks completely?

I think this is not an issue with the code but rather with the model from that particular repository. The repository says it took the importance matrix from another repository...

Is it normal that ROCm+HIPBLAS produces different results than on CPU or breaks completely?

Anyways, I've uploaded the importance matrix I generated: https://huggingface.co/JohannesGaessler/llama.cpp_importance_matrices

Is it normal that ROCm+HIPBLAS produces different results than on CPU or breaks completely?

Good to know, thanks for investigating.

Is it normal that ROCm+HIPBLAS produces different results than on CPU or breaks completely?

>First, I don't know where JohannesGaessler got his wrong info from, but the repository of course does/did not say that the imatrix is from another repository - he made this...