[BUG] DeepSpeed Inference - T5 Model

Open sooftware opened this issue 2 years ago • 0 comments

Describe the bug I used deepspeed inference like below:

model = (
    T5ForConditionalGeneration.from_pretrained(
        "paust/pko-t5-large",
    ).half().eval().to(torch.cuda.current_device())
)
model = deepspeed.init_inference(
    self.model,
    mp_size=8,
    dtype=torch.float,
    injection_policy={T5Block: ('SelfAttention.o', 'EncDecAttention.o', 'DenseReluDense.wo')}
)

It works well. (I use A100 (80GB) x 8)

But, when I change the model size(large => base), It not works.

model = (
    T5ForConditionalGeneration.from_pretrained(
        "paust/pko-t5-base",
    ).half().eval().to(torch.cuda.current_device())
)
model = deepspeed.init_inference(
    self.model,
    mp_size=8,
    dtype=torch.float,
    injection_policy={T5Block: ('SelfAttention.o', 'EncDecAttention.o', 'DenseReluDense.wo')}
)

With below error message:

File "../../lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 559, in forward
    scores += position_bias_masked
RuntimeError: The size of tensor a (384) must match the size of tensor b (256) at non-singleton dimension 3

I set max_length to 256. I checked that the size of the scores is 1.5 times the max_length I set. Can you tell me why?

May 11 '23 06:05 sooftware