ggml Proposing To Add Naming Convention For GGUF files in documents

Merged in https://github.com/ggerganov/llama.cpp/issues/7165 in llama.cpp which also includes changes to how default filenames are generated.

However I wasn't too sure where to place the proposed "GGUF Naming Convention". I think I should place it in https://github.com/ggerganov/ggml/blob/master/docs/gguf.md but happy to hear otherwise.

In short the naming convention I want to document is <Model>-<Version>-<ExpertsCount>x<Parameters>-<Quantization>.gguf (details of the proposal in https://github.com/ggerganov/llama.cpp/pull/4858)

May 13 '24 04:05 mofosyne

On a side note... does it make sense to also standardize between internal KV form and JSON conversion and back? Doesn't seem to be an issue for huggingface, but it's something to consider.

May 14 '24 07:05 mofosyne

I am not sure, if <BaseModel>-<Version>-<Model>-<Version>-<ExpertsCount>x<Parameters>-<Quantization>.gguf or <Model>-<Version>-<BaseModel>-<Version>-<ExpertsCount>x<Parameters>-<Quantization>.gguf or something like it wouldn't be better. I am having a really hard time to find finetunes of mistral-7b-v0.3 at the Huggingface Open LLM Leaderboard, because many model authors do not seem to adhere to a standardized naming scheme and fail to mention the name of the base model. Both the mistral-7b-v0.1 and v.03 have the same parameter count and therefore differentiating by that property doesn't help either. The leaderboard is simply too cluttered, the search feature is insufficient and model authors fail to provide relevant info.

Things need to change.

Edit: I also created a discussion at https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard/discussions/761

May 28 '24 13:05 ThiloteE

@ThiloteE yeah currently doing some extra refactoring in https://github.com/ggerganov/llama.cpp/pull/7499 and got some extra thoughts so maybe have a look at my last comment there and see if you got your 5c (At the moment, for that refactoring, was trying to figure out how to do auto estimation of model size)

If we need to, we can adjust the naming convention as needed to encourage best practice, but will need some guidance from the wider huggingface community what they expect. E.g. do we need a variant field (e.g. -instruct) and should version code actually go to the back of the filename?

(Also do let us know if the internal KV store is missing any fields that would be handy to have in terms of keeping track of model specs and naming)

May 28 '24 15:05 mofosyne

@ThiloteE okay I've now thought this a bit further and think TheBloke had the right idea for a naming scheme

<basename>-<expert_count>x<per_model_weight>-<finetune>-<version>-<encoding>-<shard>.gguf
Mixtral-8x7B-Instruct-v0.1-Q2_K.gguf

So I've now updated my PR proposal in https://github.com/ggerganov/llama.cpp/pull/7499 and will update the gguf naming convention doc if the PR passes.

May 30 '24 17:05 mofosyne

So, no version number of the base model, but only versioning the finetune?

May 30 '24 20:05 ThiloteE

(At the moment, for that refactoring, was trying to figure out how to do auto estimation of model size)

@ThiloteE and I, were looking into how to derive an approximate model parameter count based on information available.

I wrote a script in powershell to pull the metadata with gguf-dump.py and followed this post https://kipp.ly/transformer-param-count/ to achieve a formula that is close. The formula was based on this paper https://arxiv.org/pdf/2112.00861

While the calculations match what is produced by kipply, I am not sure what may be missing from the formula, we believe it is off by about 7%. This may be due to the way the factor should be found. I believe it should be different for each model, probably calculated but I don't know how they found the factor they used.

$p = ($factor * ($block_count) * ([math]::Pow($embedding_length, 2))) + (($vocab_size * $embedding_length)*2)

.	.
embedding_length	8192
block_count	64
vocab_size	65536
factor	12
51539607552	P = factor * block_count* embedding_length^2
536870912	Embeddings = vocab_size * embedding_length
52076478464	P + Embeddings
52613349376	P + E + Unembeddings

May 30 '24 21:05 3Simplex

@ThiloteE (fair enought, let's keep the convo here)

        metadata.name = data.get("general.name")
        metadata.basename = data.get("general.basename")
        metadata.finetune = data.get("general.finetune")
        metadata.author = data.get("general.author")
        metadata.version = data.get("general.version")
        metadata.url = data.get("general.url")
        metadata.description = data.get("general.description")
        metadata.license = data.get("general.license")
        metadata.source_url = data.get("general.source.url")
        metadata.source_hf_repo = data.get("general.source.huggingface.repository")

I'm thinking of adding basename and finetune, but is there anything missing here that you think should be here for easier search engine reading of the KV store perhaps?

May 31 '24 16:05 mofosyne

In theory I am of the opinion: "add as much as possible, but only add as much as necessary", but I am not sure, if that holds in practice. In practice "add as much as possible" might be easier to work with :D I don't know if all this data is available, but if it is, then:

DateTime, as in yyyy-mm-ddThh:mm:ss
parameter size as in python regex \dx\db or \db
languages
names and urls datasets

I read you take information from the readme.md into account, right? Some model authors do put useful information in there. For example

https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/blob/main/README.md

language:
  - multilingual
pipeline_tag: text-generation
tags:
  - nlp
  - code
  - vision

https://huggingface.co/mlabonne/NeuralDaredevil-8B-abliterated/blob/main/README.md

license: other
tags:
  - dpo
datasets:
  - mlabonne/orpo-dpo-mix-40k
model-index:
  - name: Daredevil-8B-abliterated-dpomix

May 31 '24 19:05 ThiloteE

Just a heads up if anyone is interested that the https://github.com/ggerganov/llama.cpp/pull/7499 PR is ready for review.

I've now taken @ThiloteE suggestion into account, adding tags, datasets and model-index.name into account when writing to the gguf KV store in the interest of making it accessible for purposes like a leaderboard.

I've decided that I shall keep to this {name}{parameters}{finetune}{version}{precision} arrangement as we really want to keep this focused only on information directly related to the model in question and stuff like "basemodel version" is best referenced in KV store instead.

Jun 02 '24 14:06 mofosyne

Thank you @mofosyne for your work! 👍

Jun 02 '24 14:06 ThiloteE

@ThiloteE no probs. It's a teamwork everyone contributed a bit!

https://github.com/ggerganov/llama.cpp/pull/7499 now merged in, we may need to update the gguf documentation to reflect the new agreed upon form.

Jul 18 '24 11:07 mofosyne

https://github.com/ggerganov/ggml/pull/896 PR documentation change for ggml gguf standard waiting to be merged in

Jul 21 '24 07:07 mofosyne