intel-extension-for-pytorch icon indicating copy to clipboard operation
intel-extension-for-pytorch copied to clipboard

return_dict_in_generate not working for model.generate after ipex.llm.optimize

Open YYue000 opened this issue 1 year ago • 2 comments

Describe the bug

model_name = "meta-llama/Meta-Llama-3.1-8B" 

dtype = "bfloat16"
amp_enabled = True if dtype != "float32" else False
amp_dtype = getattr(torch, dtype)
config = AutoConfig.from_pretrained(model_name, torchscript=True)

model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=amp_dtype, config=config, low_cpu_mem_usage=True, trust_remote_code=True)
model = model.to(memory_format=torch.channels_last)
model = model.eval()
model_org = model

model = ipex.llm.optimize(model, dtype=amp_dtype, inplace=False, deployment_mode=True)
tokenizer = AutoTokenizer.from_pretrained(model_name)

model.generate(input_ids, use_cache=True, return_dict_in_generate=True)
model_org.generate(input_ids, use_cache=True, return_dict_in_generate=True) 

model.generate will return only a tensor while model_org.generate will return a dict.

Is there any explanation or solution for that? thanks.

Versions

2.4.0+cpu

YYue000 avatar Oct 22 '24 05:10 YYue000

@YYue000 Thanks for reporting this issue. We will look into it and give feedback later.

wangkl2 avatar Oct 22 '24 07:10 wangkl2

@YYue000 I can reproduce this issue. Using 2.4.0+cpu, a) Without ipex.llm.optimize(), invoking model.generate() with return_dict_in_generate=True would return an object type of transformers.generation.utils.GenerateDecoderOnlyOutput where attributes like sequences can be retrieved. b) With ipex.llm.optimize(), model.generate() would always return a tensor object.

This issue has been fixed by our dev team. The fix will be included in the upcoming ipex 2.5 release.

wangkl2 avatar Oct 25 '24 07:10 wangkl2

@YYue000 I can reproduce this issue. Using 2.4.0+cpu, a) Without ipex.llm.optimize(), invoking model.generate() with return_dict_in_generate=True would return an object type of transformers.generation.utils.GenerateDecoderOnlyOutput where attributes like sequences can be retrieved. b) With ipex.llm.optimize(), model.generate() would always return a tensor object.

This issue has been fixed by our dev team. The fix will be included in the upcoming ipex 2.5 release.

@YYue000 IPEX v2.5.0+cpu has been released yesterday. The return_dict_in_generate issue is gone with this commit https://github.com/intel/intel-extension-for-pytorch/commit/584a4e2e2c6193b926554f951d2608489cac5d7a. Please help verify on your side.

wangkl2 avatar Nov 06 '24 01:11 wangkl2