candle
candle copied to clipboard
Problem loading metadata of gguf file
I encountered an error while executing the example quantized-phi, which I slightly modified. However, I suspect the issue might not be with my modifications.
The problem seems to be related to the function candle_transformers::models::quantized_llama::ModelWeights::from_gguf
. It appears to be unable to locate the necessary metadata from the model. This is interesting because Hugging Face is able to display the model's metadata correctly.
Here are some screenshots for further reference:
I would appreciate any assistance in resolving this issue. Thank you in advance.
You may want to use the latest github version as 0.4.1 may well not be compatible with phi-3.
Also you will need to use the --which phi-3
flag to specify that you're using this variant.
I found that the naming convention in the phi-3 metadata (also the tensors) is different from llama, so we can't directly apply quantized-llama Here is the from_gguf func, please check the notes
Seems that there was a "silent" change of the naming convention in phi-3 gguf models, see #2154 , candle now supports both the old and the new naming convention in the quantized-phi example, Phi3
is the "new" version with a phi3
architecture and Phi3b
is the version with a llama
architecture.
I think the real reason caused the problem is this. Firstly, there are two different methods mentioned for conversion:
-
convert.py
convert model to gguf with the architecturellama
always -
convert-hf-to-gguf.py
convert model to gguf with the architecture from the given model
However, it appears that phi3 can only be converted using convert-hf-to-gguf.py due to encountering an NotImplementedError with the message: Unknown rope scaling type: su.
This inconsistency in conversion methods seems to have led to the problem. The left model in the screenshot was converted using convert-hf-to-gguf.py
, while the right one was converted using convert.py
.
I am wondering if candle could auto detect the architecture from the gguf model converted by convert-hf-to-gguf.py
and run it correctly, which will get to the root of the problem.
Sorry for the delayed response.
I would have thought that we support both methods now, the phi3
architecture with the quantized-phi
example and the llama
one with the quantized
example. Doesn't that work for you?
It works out fine, but if quantized_llama
could run all models converted by convert-hf-to-gguf.py
would be excellent.
Here are many models to run, but I must modify the architecture from candle_transformers::models::quantized_phi3
to run it normally.
For example to run the qwen model, just rename phi3
to qwen2
.
But if quantized_phi3::ModelWeights::from_gguf don't write the architecture to death, we can run everything converted by convert-hf-to-gguf.py
at once.