schmorp comments

Results 45 comments of


                                            schmorp

GGML_ASSERT: ../llama.cpp/ggml-quants.c:10340: grid_index >= 0

The BagelMix nan happens with b2355 as well. And the nan issue correlatres with the crash at least in one case (miqu-1-120b). Also, the same imatrix fails the same way...

GGML_ASSERT: ../llama.cpp/ggml-quants.c:10340: grid_index >= 0

With BagelMix, it happens all the way back to b2060 (Feb 4). Not sure what to test next - it seems to affect only cuda, in practically all versions.

GGML_ASSERT: ../llama.cpp/ggml-quants.c:10340: grid_index >= 0

@slaren ah, wow, thanks for tracking it down! That's probably why it works on the cpu then. Whats strange is that it seems to affect a lot of models that...

GGML_ASSERT: ../llama.cpp/ggml-quants.c:10340: grid_index >= 0

LLAMA_CUDA_FORCE_MMQ seems to work around this indeed, at no discernible speed loss in my config, too. In the meantime, I have added this to catch this specific problem earlier, maybe...

GGML_ASSERT: ../llama.cpp/ggml-quants.c:10340: grid_index >= 0

For the next reader sufferting form these problems: LLAMA_CUDA_FORCE_MMQ does not, unfortunately, work in all cases (example model: Meidebenne-120b-v1.0)

GGML_ASSERT: ../llama.cpp/ggml-quants.c:10340: grid_index >= 0

And an anecdotal statistic: at the moment, roughly a third of the models I quantize on huggingface either have trouble during imatrix generation or later during IQ1_S or other quants.

GGML_ASSERT: ../llama.cpp/ggml-quants.c:10340: grid_index >= 0

Just wanted to add that this still affects a large number of models - almost half of the llama-3 models i quantize can't generate iq3_xxs or other i-quants, without nans...

Regressions on IQ3_XXS over time

Out of curiosity, did the resulting gguf sizes also change?

Output of main and server is totally different

Does it also change when you repeat it? LLMs are not deterministic, so there is no expectation that the same question yields the same answer. If that is the issue,...

Is it normal that ROCm+HIPBLAS produces different results than on CPU or breaks completely?

Hi, mradermacher here. I was pointed at this PR, and I get a lot of negative feedback because of the misinformation spread here. First, I don't know where JohannesGaessler got...