Diego Devesa

Results 361 comments of Diego Devesa

If it looks ok with CPU, then it is probably an issue in the CUDA/HIP backend. I don't know why the nans are being generated, we would need to investigate.

I already tried disabling FP16 mat mul and `LLAMA_CUDA_FORCE_MMQ` and it still results in nans. The model is likely corrupted, I think there are nans in the block scales, but...

So this has nothing to do with FP16 precision. The block scales of `blk.21.ffn_up.weight` have nans, which results in nans when performing the matrix multiplication. In the CPU backend, the...

I was talking specifically about the `Meta-Llama-3-70B.i1-Q2_K.gguf` model that produces nans with CUDA and HIP.

The main reason this received more attention is because it pointed to a bug in a backend, which is something I am interested on and I can act upon immediately....

I only speak for myself. If you want the "llama.cpp position", ask @ggerganov. Please, don't make me have to prefix everything I say with "in my opinion". The main goal...

Specifically, this model had `nan` values in the block scales. `nan` propagate to every operation, so and at the very least this would cause the entire layer with `nan` values...

There is simply no way that llama.cpp can guarantee that a model with nans will work at all. In most cases it will result in the entire activation to be...

We could do better in this respect, but the "CUDA error 700" in the linked issue indicates a programming error and something like that is probably never going to be...