Pierrick Hymbert
Pierrick Hymbert
@ggerganov, finally, I would prefer not to go this way but to stop the generation at `n_ctx` with a warning, instead of printing a warning each time if `n_predict` is...
@ggerganov @slaren please have a look to this proposal
> Maybe it would be simpler to set `n_predict` to `n_ctx_train` by default if not set in the request. Yeah, it was the first version, but I feel it noisy...
I see, I am OK with both solutions even if it will be sort of a breaking change to set n_predict all the time. AFAIK not all models hallucinate and...
> This would be simple if context shifting was opt-in, then there would always be a hard limit of `n_ctx` tokens. I am not sure that enabling context shift by...
> @ggerganov up to you, but we need to stop this infinite loop recurrent concern some way. @ggerganov I think with the removal of hard coded stop tokens, this PR...
> > Maybe it would be simpler to set `n_predict` to `n_ctx_train` by default if not set in the request. > > Yes, let's do that. Context-shift has to be...
Thanks. Do you mind to add a tests.sh as we did in #6655
Great work. As we discussed previously, servers' test coverage matters, and adding a new scenario in the test framework is mandatory.
> Are there already any tests that assert correctness for the server? I didn't see any so as part of this implementation I would try to add some. https://github.com/ggerganov/llama.cpp/tree/master/examples/server/tests