Pierrick Hymbert comments

Results 83 comments of


                                            Pierrick Hymbert

llama.cpp with mistral-7b-instruct-v0.2.Q5_K_M.gguf performance comparison between Intel CPU, nVIDIA GPU and Apple M1/M2

hello @a-b-n-e-o , please increase the KV cache size with `--n-ctx` and play with the batch size `-b`

Can't get server to run with more than 6 slots

@pudepiedj What are the issue with the LOG_INFO ? you can now switch to text format with `--log-format text`.

Can't get server to run with more than 6 slots

> > @pudepiedj What are the issue with the LOG_INFO ? you can now switch to text format with `--log-format text`. > > There is no issue: it's nicely implemented;...

Error when converting safe tensors to gguf

@ieatbeansbruh Hi, be sure you have downloaded all model files in folder `models/Smaug`. Especially it looks there is no `model-00001-of-*.safetensors`. Please confirm with `ls -al models/Smaug`.

Infinite loop of "context shift"

> Sorry, but the patch has not resolved the issue for me. Here is a simple example how to generate: #server: ./server -m llama-2-7b.Q5_K_S.gguf --n-gpu-layers 33 --ctx-size 2048 --parallel 1...

Infinite loop of "context shift"

The user can set `--n-predict` option to cap the number of tokens any completion request can generate or pass `n_predict`/`max_tokens` in the request body. Otherwise infinite loop scenario can occur...

Infinite loop of "context shift"

> > [...] maybe the default `--n-predict` must be set to `--ctx-size`. > > @phymbert That would not fix the problem because the bug is caused by overflowing the context...

`quantize`: add imatrix and dataset metadata in GGUF

We might also add the number of chunks the imatrix was computed with

`quantize`: add imatrix and dataset metadata in GGUF

@ggerganov, is this general approach relevant ?

`quantize`: add imatrix and dataset metadata in GGUF

@slaren, can you please have a second check and merge it if approved