Pierrick Hymbert

Results 83 comments of Pierrick Hymbert

hello @a-b-n-e-o , please increase the KV cache size with `--n-ctx` and play with the batch size `-b`

@pudepiedj What are the issue with the LOG_INFO ? you can now switch to text format with `--log-format text`.

> > @pudepiedj What are the issue with the LOG_INFO ? you can now switch to text format with `--log-format text`. > > There is no issue: it's nicely implemented;...

@ieatbeansbruh Hi, be sure you have downloaded all model files in folder `models/Smaug`. Especially it looks there is no `model-00001-of-*.safetensors`. Please confirm with `ls -al models/Smaug`.

> Sorry, but the patch has not resolved the issue for me. Here is a simple example how to generate: #server: ./server -m llama-2-7b.Q5_K_S.gguf --n-gpu-layers 33 --ctx-size 2048 --parallel 1...

The user can set `--n-predict` option to cap the number of tokens any completion request can generate or pass `n_predict`/`max_tokens` in the request body. Otherwise infinite loop scenario can occur...

> > [...] maybe the default `--n-predict` must be set to `--ctx-size`. > > @phymbert That would not fix the problem because the bug is caused by overflowing the context...

We might also add the number of chunks the imatrix was computed with

@ggerganov, is this general approach relevant ?

@slaren, can you please have a second check and merge it if approved