SharifIsmail

Results 3 comments of SharifIsmail

@JohannesGaessler > The CUDA backend is deterministic as in the results for the same input parameters will have the same output logits. However, if you use >1 slots or prompt...

I see. Thanks @compilade @JohannesGaessler So, running higher-precision models with a higher-precision KV cache would alleviate this effect, right?

I did some quick tests for the sake of curiosity with "Phi-3-mini-4k-instruct-**fp16**.gguf" vs "Phi-3-mini-4k-instruct-**q4**.gguf". Bottom Line: As you stated, JohannesGaessler, both are nondeterministic for the vast majority of cases. Even...