ipex-llm
ipex-llm copied to clipboard
Model output is different when using default optimize_model
While testing ipex-llm I observed a difference in model output after calling optimize_model() which defaulted to sym_int4. Please help clarify the following:
- What is causing this variation in output ?
- Does optimize_model() call ensure that the model accuracy remains the same across eval benchmarks like human eval, mmlu etc ?
Thanks!
env : Python 3.9 ipex-llm 2.1.0b20240416 torch 2.2.2 transformers 4.31.0
reproducer:
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"
os.environ["TRANSFORMERS_VERBOSITY"] = "error"
import sys
import warnings
warnings.filterwarnings("ignore")
import torch
torch.manual_seed(100)
from transformers import AutoTokenizer, AutoModelForCausalLM
model_path = 'meta-llama/Llama-2-7b-chat-hf'
model = AutoModelForCausalLM.from_pretrained(model_path,
trust_remote_code=True,
use_cache=True)
tokenizer = AutoTokenizer.from_pretrained(model_path,
trust_remote_code=True)
system_prompt = "You are a creative poet. Write a poem about the given topic. Use only 100 words"
user_prompt = "Write a poem about owls and starry nights"
prompt_template = f"<s>[INST] <<SYS>>\n {system_prompt} \n<</SYS>>\n\n {user_prompt} [/INST]"
print("*"*10 + "Original model output" + "*"*10)
print(tokenizer.decode(model.generate(tokenizer.encode(prompt_template, return_tensors="pt"), max_new_tokens=100)[0], skip_special_tokens=True))
sys.stdout.flush()
from ipex_llm import optimize_model
model = optimize_model(model)
print("*"*10 + "IPEX-LLM Optimized model output" + "*"*10)
print(tokenizer.decode(model.generate(tokenizer.encode(prompt_template, return_tensors="pt"), max_new_tokens=100)[0], skip_special_tokens=True))
sys.stdout.flush()
output:
**********Original model output**********
[INST] <<SYS>>
You are a creative poet. Write a poem about the given topic. Use only 100 words
<</SYS>>
Write a poem about owls and starry nights [/INST] Sure! Here is a 100-word poem about owls and starry nights:
Silent sentinels of the night,
Owls perch on boughs, their eyes alight.
Glittering stars above, a twinkling sight,
A magical night, pure delight.
Converting the current model to sym_int4 format......
**********IPEX-LLM Optimized model output**********
[INST] <<SYS>>
You are a creative poet. Write a poem about the given topic. Use only 100 words
<</SYS>>
Write a poem about owls and starry nights [/INST] Sure, here is a poem about owls and starry nights in exactly 100 words:
Owls hoot in the night's embrace
Their soft coos echo through space
While stars twinkle bright and slow
A celestial show to know
Nature's symphony so grand
In this peaceful night's command
Hi,
We are doing some further optimizations in ipex-llm for optimal performance, which may change some logits and outputs, this is expected. But at the same time, we are running accuracy benchmarks (e.g. the tasks in https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) to make sure that our optimizations don't have any obvious negative impacts in the accuracy. If you observe any wrong output with the ipex-llm optimized model, feel free to tell us and we will check it. Thanks!