DeepSpeed [BUG] Unexpected Inference results with Galactica Model

[BUG] Unexpected Inference results with Galactica Model

Open allanj opened this issue 1 year ago • 0 comments

Describe the bug I use the HuggingFace repo to implement the inference from Galactica. The Galactica model is implemented based on OPT model in the HuggingFace implementation.

While my first attempt to generate, it produce some weird output, I investigated the codebase of injection policy for OPT.

But the activation is Galactica is GELU rather than ReLU as in the code below, which lead to unexpected generation results. https://github.com/microsoft/DeepSpeed/blob/cc67f22f60f7ec20c61a53b4c3da4e8799d7de5a/deepspeed/module_inject/containers/opt.py#L77

To Reproduce

import os
import deepspeed
import torch
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
deepspeed.init_distributed("nccl")
local_rank = int(os.getenv('LOCAL_RANK', '0'))
world_size = int(os.getenv('WORLD_SIZE', '1'))

model_name= 'facebook/galactica-125m'
tokenizer = AutoTokenizer.from_pretrained(model_name)

kernel_inject = True

model = AutoModelForCausalLM.from_pretrained(model_name)

model = model.eval()
kwargs = dict(replace_with_kernel_inject=True)

model = deepspeed.init_inference(
    model,
    mp_size=world_size,
    dtype=getattr(torch, "float16"),
    **kwargs,
)
model = model.module


instruction = "Answer the following question through step-by-step reasoning."
cot_trigger = "Answer: Let's think step by step."
question = "If \\frac {1}{2} is greater than \\frac {M}{16} then M could be (A) 7 (Correct) (B) 8 (C) 9 (D) 10 (E) 32"
input_text = f"{instruction}\nQuestion: {question}\n{cot_trigger}\n"

res = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(res["input_ids"].to(torch.cuda.current_device()), do_sample=True, max_new_tokens=1024, num_return_sequences=5)
if torch.distributed.get_rank() == 0:
    print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

Temporary solution

As it's hard for me to find how to add a new class, and there is no documentation about "after we add the new class, how the model in transformer in huggingface can accurately refer to the class". And when I use Galactica model in Huggingface, it will automatically refer to OPT, and use the OPT injection policy.
My current solution is to directly modify the opt.py, to replace "ReLU" with "GELU". I think that's not a general solution. Please help advise.
How to have a more general solution?

Apr 16 '23 05:04 allanj

DeepSpeed DeepSpeed copied to clipboard

[BUG] Unexpected Inference results with Galactica Model

DeepSpeed
DeepSpeed copied to clipboard