llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

llama: introduce support for model-embedded sampling parameters

Open taronaeo opened this issue 2 weeks ago • 8 comments

ref: #17088

This PR introduces the feature to allow sampler parameters to be set from GGUF KV metadata allowing model creators to embed recommended sampler settings unless explicitly overridden using the CLI flags.

Handy for users who do not want to tinker with the settings but want the recommended settings applied.

Priority of Sampling Parameters

  1. User flags (i.e., setting --temp 0.6)
  2. Model-Embedded recommendation (i.e., general.sampling.temp = 0.6)
  3. Default hardcoded values in common_params_sampling

Introduced Metadata

  • general.sampling.sequence
  • general.sampling.top_k
  • general.sampling.top_p
  • general.sampling.min_p
  • general.sampling.xtc_probability
  • general.sampling.xtc_threshold
  • general.sampling.temp
  • general.sampling.penalty_last_n
  • general.sampling.penalty_repeat
  • general.sampling.mirostat
  • general.sampling.mirostat_tau
  • general.sampling.mirostat_eta

Please let me know if we should introduce more sampling parameters.

Embedding From Safetensors into GGUF

By default, it will attempt to find the generation_config.json within the model directory and automatically add recommended sampler parameters into the GGUF metadata. If a sampling parameter is not available within the file, users can also specify --metadata metadata.json.

Note that --metadata metadata.json takes precedence over generation_config.json and will overwrite metadata if duplicate keys are found.

$ cat > metadata.json << EOF 
{
    "general.sampling.temp": 0.6
}
EOF

$ python3 convert_hf_to_gguf.py --outfile deepseek-r1-distill-qwen-1.5b.gguf --metadata metadata.json deepseek-r1-distill-qwen-1.5b/

$ ./build/bin/llama-cli -m deepseek-r1-distill-qwen-1.5b.gguf -p "Write me a dog walking business idea 1. " -no-cnv -n 1 -t 10 2>&1 | grep "temp"    
llama_model_loader: - kv   2:                       general.sampling.temp f32             = 0.600000
llama_model_loader: - kv  27:                    tokenizer.chat_template str              = {% if not add_generation_prompt is de...
        top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.600
sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist

taronaeo avatar Nov 09 '25 12:11 taronaeo

$ cat > metadata.json << EOF 
{
    "general.sampler.temp": 0.6
}
EOF

So, you're suggesting that parameters should be added manually before conversion? How likely is that to happen?

AFAIK most models come with recommended (though, some are likely to just be copy-pasted from somewhere) settings in generation_config.json, so perhaps a better idea to get them from there?

Edit: or is that automatically added to metadata?

CISC avatar Nov 09 '25 12:11 CISC

$ cat > metadata.json << EOF 
{
    "general.sampler.temp": 0.6
}
EOF

So, you're suggesting that parameters should be added manually before conversion? How likely is that to happen?

AFAIK most models come with recommended (though, some are likely to just be copy-pasted from somewhere) settings in generation_config.json, so perhaps a better idea to get them from there?

You're right, I didn't spot that. Well I guess I have to rework the code such that it pulls generation_config.json from the model directory, maps to general.sampler.* and we can skip the --metadata flag.

taronaeo avatar Nov 09 '25 12:11 taronaeo

I think sampling sequence is important too. Also I personally only really tend to use min-p and xtc(not in your proposal).

Green-Sky avatar Nov 09 '25 13:11 Green-Sky

@Green-Sky Will include general.sampler.xtc_probability and general.sampler.xtc_thresold first then --samplers SEQUENCE.

@CISC RE generation_config.json vs. the custom --metadata file, I've realised that generation_config.json does not actually document (non-standard) support for parameters such as mirostat. In this case, we'll still need support for --metadata metadata.json to cover these parameters, unless there is a better way of handling this.

taronaeo avatar Nov 09 '25 14:11 taronaeo

@CISC RE generation_config.json vs. the custom --metadata file, I've realised that generation_config.json does not actually document (non-standard) support for parameters such as mirostat. In this case, we'll still need support for --metadata metadata.json to cover these parameters, unless there is a better way of handling this.

Does transformers even have this parameter?

CISC avatar Nov 09 '25 14:11 CISC

@CISC RE generation_config.json vs. the custom --metadata file, I've realised that generation_config.json does not actually document (non-standard) support for parameters such as mirostat. In this case, we'll still need support for --metadata metadata.json to cover these parameters, unless there is a better way of handling this.

Does transformers even have this parameter?

Doesn't look like it. Followed some of Ollama's supported parameters: https://ollama.readthedocs.io/en/modelfile/#parameter

taronaeo avatar Nov 09 '25 14:11 taronaeo

@CISC any update on this PR?

taronaeo avatar Nov 14 '25 03:11 taronaeo

@CISC any update on this PR?

Thanks for the reminder. :)

CISC avatar Nov 14 '25 08:11 CISC