Diego Devesa comments

Results 361 comments of


                                            Diego Devesa

Is it normal that ROCm+HIPBLAS produces different results than on CPU or breaks completely?

If it looks ok with CPU, then it is probably an issue in the CUDA/HIP backend. I don't know why the nans are being generated, we would need to investigate.

Is it normal that ROCm+HIPBLAS produces different results than on CPU or breaks completely?

I already tried disabling FP16 mat mul and `LLAMA_CUDA_FORCE_MMQ` and it still results in nans. The model is likely corrupted, I think there are nans in the block scales, but...

Is it normal that ROCm+HIPBLAS produces different results than on CPU or breaks completely?

So this has nothing to do with FP16 precision. The block scales of `blk.21.ffn_up.weight` have nans, which results in nans when performing the matrix multiplication. In the CPU backend, the...

Is it normal that ROCm+HIPBLAS produces different results than on CPU or breaks completely?

I was talking specifically about the `Meta-Llama-3-70B.i1-Q2_K.gguf` model that produces nans with CUDA and HIP.

Is it normal that ROCm+HIPBLAS produces different results than on CPU or breaks completely?

Yes, I think that would be good.

Is it normal that ROCm+HIPBLAS produces different results than on CPU or breaks completely?

The main reason this received more attention is because it pointed to a bug in a backend, which is something I am interested on and I can act upon immediately....

Is it normal that ROCm+HIPBLAS produces different results than on CPU or breaks completely?

I only speak for myself. If you want the "llama.cpp position", ask @ggerganov. Please, don't make me have to prefix everything I say with "in my opinion". The main goal...

Is it normal that ROCm+HIPBLAS produces different results than on CPU or breaks completely?

Specifically, this model had `nan` values in the block scales. `nan` propagate to every operation, so and at the very least this would cause the entire layer with `nan` values...

Is it normal that ROCm+HIPBLAS produces different results than on CPU or breaks completely?

There is simply no way that llama.cpp can guarantee that a model with nans will work at all. In most cases it will result in the entire activation to be...

Recoverable Error Handling

We could do better in this respect, but the "CUDA error 700" in the linked issue indicates a programming error and something like that is probably never going to be...