DeepSpeed
DeepSpeed copied to clipboard
[QUESTION] How to figure out correct `injection_policy` for Flan-T5
I would like to use deepspeed-inference with the flan-t5 model and I have the following code:
def get_model():
model_name = "google/flan-t5-small"
tensor_parallel = int(os.getenv("TENSOR_PARALLEL_DEGREE", "2"))
local_rank = int(os.getenv("LOCAL_RANK", "0"))
model = T5ForConditionalGeneration.from_pretrained(
model_name, device_map="auto"
)
tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-small")
# create the model
config = DeepSpeedInferenceConfig(
replace_with_kernel_inject=True,
dtype=model.dtype,
tensor_parallel=DeepSpeedTPConfig(
enabled=True, tp_size=tensor_parallel, mpu=None, tp_group=None
),
injection_policy={T5Block: ('SelfAttention.o', 'EncDecAttention.o', 'DenseReluDense.wo')}
)
model = deepspeed.init_inference(
model,
config=config,
)
generator = pipeline(
task="text2text-generation", model=model, tokenizer=tokenizer, device=local_rank # TODO: try text2text-generation instead
)
return generator
Basically I'm wondering if I can use the T5Block
class in the injection_policy
for the flan-t5 model as it's part of the same model family. I'm wondering how I can figure out whether this would or wouldn't work without just more or less blindly trying both out.
More generally, how can I find more information on the requirements of an injection_policy
for models and verifying that the injection_policy
actually makes sense?
I have read:
- https://deepspeed.readthedocs.io/en/latest/inference-init.html
- https://www.deepspeed.ai/tutorials/inference-tutorial/#initializing-for-inference
but wasn't able to find an answer to my question.
I actually just saw that a PR is in progress on validating users' injection_policy
: https://github.com/microsoft/DeepSpeed/pull/2630 ❤️
Hi @ivo-1 , have you made any headway with this? I am interested in the same!
Hi @ivo-1 and @alexcoca. I am also interested in figuring out the correct injection policy
for Flan-T5!
According to this auto tensor parallelism doc, t5 model should no longer need injection policy?
Thanks @brevity2021 for the doc.
I want to use Flan-UL2 model, but didn't find that in neither supported / unsupported models.
Can anyone help figure out the correct injection_policy for Flan-UL2 ?
Has anyone been able to determine the correct injection_policy for Flan-UL2, or is it confirmed whether this policy is supported or unsupported for Flan-UL2?