llama2-webui icon indicating copy to clipboard operation
llama2-webui copied to clipboard

The temperature parameter does not seem to work

Open ibutenko opened this issue 1 year ago • 2 comments

Hi! Code sample first:

from llama2_wrapper import LLAMA2_WRAPPER, get_prompt
from IPython.display import display, Markdown

chat_history = []

llama2_wrapper = LLAMA2_WRAPPER(
    backend_type="gptq", 
)

user_input = input("You: ")
response_generator =  llama2_wrapper.run(user_input, chat_history=chat_history, max_new_tokens = 1000, temperature = 0.15, system_prompt = "")

Prompt: How was Tupac Shakur influenced by Nirvana?

Wrapper initialization output:

Running on GPU with backend torch transformers.
Model path is empty.
Use default gptq model path: ./models/Llama-2-7b-Chat-GPTQ
Model exists in ./models/Llama-2-7b-Chat-GPTQ
The safetensors archive passed at ./models/Llama-2-7b-Chat-GPTQ\model.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
skip module injection for FusedLlamaMLPForQuantizedModel not support integrate without triton yet.

Issue: no matter how I change the temperature parameter (changed to 0, 1, -1, 0.1 etc), it does not change the response (the prompt is just a sample: I have the same issue with any other prompt). Simplifying the code to print(llama2_wrapper(prompt, temperature = 0.15)) also doesn't help. All other parameters work just fine.

At the same time, when I am using Llama 2 UI on Replicate and change the temperature, answers change too, and the model stops hallucinating when temperature is set to ~0.8 or less. Is this something I am doing wrong, or the parameter does not through the wrapper?

I would appreciate any advice. Thanks! Ilya

ibutenko avatar Oct 14 '23 20:10 ibutenko