llama.cpp fix: Fixes wrong input type for raw

[x] I have read the contributing guidelines
Self-reported review complexity:
- [x] Low
- [ ] Medium
- [ ] High

A wrong data type has been passed from the add_tensor and add_tensor_info to this function causing another exception while raising another exception. So I changed the input types based on the name raw_dtype and later converted it to an GGMLQuantizationType object which can also handle invalid arguments itself. related issue #8929

Aug 08 '24 10:08 farbodbj

Thanks for finding this and fixing it. There has been many refactors lately where the old convert_llama_ggml_to_gguf.py was not tested at all. (mostly because I don't have old GGML models around to test)

But I think the type of the raw_dtype parameter of GGUFWriter.add_tensor and GGUFWriter.add_tensor_info should stay GGMLQuantizationType (because it's easier to figure how to use that parameter if its type is more specific than int), and that convert_llama_ggml_to_gguf.py should be fixed instead.

Would this work?

diff --git a/convert_llama_ggml_to_gguf.py b/convert_llama_ggml_to_gguf.py
index 7b00b439..701df869 100755
--- a/convert_llama_ggml_to_gguf.py
+++ b/convert_llama_ggml_to_gguf.py
@@ -116,7 +116,7 @@ class Tensor:
         assert quant is not None, 'Unknown tensor type'
         (blksize, tysize) = quant
         offset += 12
-        self.dtype= dtype
+        self.dtype= gguf.GGMLQuantizationType(dtype)
         self.dims = struct.unpack(f'<{n_dims}I', data[offset:offset + (4 * n_dims)])
         offset += 4 * n_dims
         self.name = bytes(data[offset:offset + name_len])

Aug 10 '24 16:08 compilade

@compilade You're right using GGMLQuantizationType is kind of more readable and idempotent. I tested your suggested fix (created a patch and applied it), and it fixes this issue. I used your fix and committed the changes again. There was another problem with this script too, I'll soon create the issue and I'm trying to fix it but I need to get to know the ggml and gguf format more. Thanks for your review.

Aug 10 '24 18:08 farbodbj

Is the failed CI check required for merging this PR, do I need to do anything about it? as it does not seem to be related to this PR.

Aug 11 '24 16:08 farbodbj

Is the failed CI check required for merging this PR, do I need to do anything about it? as it does not seem to be related to this PR.

You don't need to do anything about it; a fix is pending in #8982, and the source of the problem was identified in https://github.com/ggerganov/llama.cpp/pull/7599#discussion_r1712901202.

Aug 11 '24 17:08 compilade

@compilade @ggerganov
Since This PR has been approved and labeled as merge-ready but has had no activity in the past 5 days, it came to me to prevent it from becoming stale I could mention you in the comments to take some action on it.

Aug 16 '24 09:08 farbodbj

Thanks for the reminder!

Aug 16 '24 10:08 ggerganov

fix: Fixes wrong input type for raw_dtype in ggml to gguf scripts