llama2-webui
llama2-webui copied to clipboard
The temperature parameter does not seem to work
Hi! Code sample first:
from llama2_wrapper import LLAMA2_WRAPPER, get_prompt
from IPython.display import display, Markdown
chat_history = []
llama2_wrapper = LLAMA2_WRAPPER(
backend_type="gptq",
)
user_input = input("You: ")
response_generator = llama2_wrapper.run(user_input, chat_history=chat_history, max_new_tokens = 1000, temperature = 0.15, system_prompt = "")
Prompt: How was Tupac Shakur influenced by Nirvana?
Wrapper initialization output:
Running on GPU with backend torch transformers.
Model path is empty.
Use default gptq model path: ./models/Llama-2-7b-Chat-GPTQ
Model exists in ./models/Llama-2-7b-Chat-GPTQ
The safetensors archive passed at ./models/Llama-2-7b-Chat-GPTQ\model.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
skip module injection for FusedLlamaMLPForQuantizedModel not support integrate without triton yet.
Issue: no matter how I change the temperature parameter (changed to 0, 1, -1, 0.1 etc), it does not change the response (the prompt is just a sample: I have the same issue with any other prompt). Simplifying the code to print(llama2_wrapper(prompt, temperature = 0.15)) also doesn't help. All other parameters work just fine.
At the same time, when I am using Llama 2 UI on Replicate and change the temperature, answers change too, and the model stops hallucinating when temperature is set to ~0.8 or less. Is this something I am doing wrong, or the parameter does not through the wrapper?
I would appreciate any advice. Thanks! Ilya