candle Problem loading metadata of gguf file

I encountered an error while executing the example quantized-phi, which I slightly modified. However, I suspect the issue might not be with my modifications.

The problem seems to be related to the function candle_transformers::models::quantized_llama::ModelWeights::from_gguf. It appears to be unable to locate the necessary metadata from the model. This is interesting because Hugging Face is able to display the model's metadata correctly.

Here are some screenshots for further reference:

Error screenshot

Hugging Face display

I would appreciate any assistance in resolving this issue. Thank you in advance.

Full Code

May 02 '24 16:05 cnlancehu

You may want to use the latest github version as 0.4.1 may well not be compatible with phi-3. Also you will need to use the --which phi-3 flag to specify that you're using this variant.

May 02 '24 16:05 LaurentMazare

I found that the naming convention in the phi-3 metadata (also the tensors) is different from llama, so we can't directly apply quantized-llama Here is the from_gguf func, please check the notes

code

May 03 '24 02:05 cnlancehu

Seems that there was a "silent" change of the naming convention in phi-3 gguf models, see #2154 , candle now supports both the old and the new naming convention in the quantized-phi example, Phi3 is the "new" version with a phi3 architecture and Phi3b is the version with a llama architecture.

May 05 '24 05:05 LaurentMazare

I think the real reason caused the problem is this. Firstly, there are two different methods mentioned for conversion:

convert.py convert model to gguf with the architecture llama always
convert-hf-to-gguf.py convert model to gguf with the architecture from the given model

However, it appears that phi3 can only be converted using convert-hf-to-gguf.py due to encountering an NotImplementedError with the message: Unknown rope scaling type: su.

This inconsistency in conversion methods seems to have led to the problem. The left model in the screenshot was converted using convert-hf-to-gguf.py, while the right one was converted using convert.py.

I am wondering if candle could auto detect the architecture from the gguf model converted by convert-hf-to-gguf.py and run it correctly, which will get to the root of the problem. Sorry for the delayed response.

May 16 '24 14:05 cnlancehu

I would have thought that we support both methods now, the phi3 architecture with the quantized-phi example and the llama one with the quantized example. Doesn't that work for you?

May 16 '24 14:05 LaurentMazare

It works out fine, but if quantized_llama could run all models converted by convert-hf-to-gguf.py would be excellent.

Here are many models to run, but I must modify the architecture from candle_transformers::models::quantized_phi3 to run it normally. For example to run the qwen model, just rename phi3 to qwen2.

But if quantized_phi3::ModelWeights::from_gguf don't write the architecture to death, we can run everything converted by convert-hf-to-gguf.py at once.

May 16 '24 14:05 cnlancehu

candle candle copied to clipboard

Problem loading metadata of gguf file

candle
candle copied to clipboard