ggml cuda : ggml_mul_mat assert for padded src1

cuda : ggml_mul_mat assert for padded src1

Open ggerganov opened this issue 6 months ago • 6 comments

Currently, the padded matrix multiplications in whisper.cpp are silently failing with CUDA:

https://github.com/ggerganov/ggml/blob/dbd02958fa4f46898f68ca29c27ddcdc58a06f98/examples/whisper/whisper.cpp#L224-L230

The reason is that the to_fp16_cuda and to_fp32_cuda calls assume no padding of the data. We can either assert that the data is not padded, or over-allocate a buffer accounting for the padding. The latter produces correct results, but is sub-optimal.

Drafting this PR to brainstorm some potential solutions

Dec 29 '23 08:12 ggerganov

ggml ggml copied to clipboard

cuda : ggml_mul_mat assert for padded src1

ggml
ggml copied to clipboard