Diego Devesa comments

Results 361 comments of


                                            Diego Devesa

ggml:add new member in GGML's internal data structure

Currently it has some limitations that I suspect would prevent using it with backends that only implement a small subset of operations. However, it was designed from the first moment...

GGML_ASSERT: ../llama.cpp/ggml-quants.c:10340: grid_index >= 0

Can you share specific instructions to reproduce the `nan` issue? Ideally with the smallest model that you are aware that has the issue. Or run a `git bisect` to find...

GGML_ASSERT: ../llama.cpp/ggml-quants.c:10340: grid_index >= 0

I was able to reproduce the issue with `wikitext-2-raw/wiki.test.raw` at chunk 9 such as: ``` ./imatrix -ofreq 20 -t 4 -ngl 20 -mg 0 -m models/BagelMix-8x7B.Q8_0.gguf -o imatrix-bagelmix.dat -f wikitext-2-raw/wiki.test.raw...

Scale buf_size linearly with n_ctx

This does not happen out of interactive mode, so I don't think that this is right. This would mean 1.5 GB more memory usage for 2048 context size so it...

Scale buf_size linearly with n_ctx

Increasing batch size also makes llama.cpp run out of memory, so any solution that only considers the context size and not the batch size is likely wrong.

Scale buf_size linearly with n_ctx

Thinking more about this, does it really matter what is the initial value of ```buf_size```, as long as it is big enough for the first dummy call to ```llama_eval```? This...

Refactor ggml.c for future tensor types

I think this is a good change, but I am concerned that we won't be able to change QK without breaking backwards compatibility, and if we ever want to support...

Refactor ggml.c for future tensor types

Right, I was thinking of @qwopqwop200's implementation that showed some benefits from using a group size of 128 (if I understood that correctly). But as you say that is a...

convert : get general.name from model dir, not its parent

Originally this is from https://github.com/ggerganov/llama.cpp/commit/8f8c28e89cb9531211783da697d6e7c445e2af1d. My guess is that it was done this way due to the directory structure of the original llama-2 distribution files.

multi-gpu inference produces broken output

https://rocm.docs.amd.com/projects/radeon/en/latest/docs/limitations.html