ggml
ggml copied to clipboard
[Bug] Zero temperature yields incorrect results
Passing the parameter --temp 0
causes GPT-J (and I suspect all other models) to behave very strangely. See the following output:
guberti@Precision-7540:~/ggml/build$ ./bin/gpt-j -p "Once upon a time there was a" --temp 0
main: seed = 1678657256
gptj_model_load: loading model from 'models/gpt-j-6B/ggml-model.bin' - please wait ...
gptj_model_load: n_vocab = 50400
gptj_model_load: n_ctx = 2048
gptj_model_load: n_embd = 4096
gptj_model_load: n_head = 16
gptj_model_load: n_layer = 28
gptj_model_load: n_rot = 64
gptj_model_load: f16 = 1
gptj_model_load: ggml ctx size = 13334.86 MB
gptj_model_load: memory_size = 1792.00 MB, n_mem = 57344
gptj_model_load: ................................... done
gptj_model_load: model size = 11542.79 MB / num tensors = 285
main: number of tokens in prompt = 7
Once upon a time there was aGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
No matter what prompt is used, the model repeatedly generates the token G
.
Solution
This is caused by a division-by-zero error in examples/utils.cpp
:
{
const double scale = 1.0/temp;
for (int i = 0; i < n_logits; ++i) {
logits_id.push_back(std::make_pair(logits[i]*scale, i));
}
}
Happy to contribute a simple fix if @ggerganov is busy.