ggml
ggml copied to clipboard
cuda : ggml_mul_mat assert for padded src1
Currently, the padded matrix multiplications in whisper.cpp are silently failing with CUDA:
https://github.com/ggerganov/ggml/blob/dbd02958fa4f46898f68ca29c27ddcdc58a06f98/examples/whisper/whisper.cpp#L224-L230
The reason is that the to_fp16_cuda and to_fp32_cuda calls assume no padding of the data. We can either assert that the data is not padded, or over-allocate a buffer accounting for the padding. The latter produces correct results, but is sub-optimal.
Drafting this PR to brainstorm some potential solutions
Is there any benefit to using ggml_mul_mat_pad with CUDA (and specially with cuBLAS)? This seems to me like it should be a implementation detail of the matrix multiplication algorithm.
Can't test the ggml_mul_mat_pad with CUDA because it would need some extra changes to to_fp16_cuda. Likely there will be no benefit in this case because cuBLAS will probably make the necessary optimizations internally. For Metal it helps because that mat-mat kernel requires dim 0 to be multiple of 32
But regardless, the issue remains if users try to perform matrix multiplications with padded views
I think that in general there is very poor support for any kind of non-contiguous tensors in ggml-cuda. We should definitely add more checks and improve support, but I am afraid that this is just the tip of the iceberg. We could extend test-backend-ops to automatically run tests with non-contiguous tensors for all the operations, so that we could have a better overall picture of what needs to be fixed.
At the moment I am working on moving llama.cpp fully to ggml-backend with ggml_backend_sched for partial offloading, and bringing everything up to feature parity with the old CUDA API. After that, we can remove a lot of code, globals, and generally do a lot of refactoring of the CUDA backend.
Sounds great - no rush on this padding problem
I think the metal backend could benefit from expanding matrix multiplication support and significantly improving performance in some cases such as comment.