Daniele Morotti

Results 4 comments of Daniele Morotti

Hi, the binding for that parameter is `max_tokens`.

Yes, and the --n-predict option in `llama.cpp` won't work unless you ignore the EOS token, as explained [here](https://github.com/ggml-org/llama.cpp/blob/4524290e87b8e107cc2b56e1251751546f4b9051/examples/main/README.md?plain=1#L171). Thus, I don't know if it was what you were looking for,...

Hi, if you check the code at `llama_cpp/server/app.py`, some parameters are explicitly excluded: ```python ... exclude = { "n", "logit_bias_type", "user", "min_tokens", } kwargs = body.model_dump(exclude=exclude) ``` I think the...