DeepSpeed [QUESTION] How to figure out correct `injection

I would like to use deepspeed-inference with the flan-t5 model and I have the following code:

def get_model():
    model_name = "google/flan-t5-small"
    tensor_parallel = int(os.getenv("TENSOR_PARALLEL_DEGREE", "2"))
    local_rank = int(os.getenv("LOCAL_RANK", "0"))
    model = T5ForConditionalGeneration.from_pretrained(
        model_name, device_map="auto"
    )
    tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-small")

    # create the model
    config = DeepSpeedInferenceConfig(
        replace_with_kernel_inject=True,
        dtype=model.dtype,
        tensor_parallel=DeepSpeedTPConfig(
            enabled=True, tp_size=tensor_parallel, mpu=None, tp_group=None
        ),
        injection_policy={T5Block: ('SelfAttention.o', 'EncDecAttention.o', 'DenseReluDense.wo')}
    )

    model = deepspeed.init_inference(
        model,
        config=config,
    )
    generator = pipeline(
        task="text2text-generation", model=model, tokenizer=tokenizer, device=local_rank # TODO: try text2text-generation instead
    )
    return generator

Basically I'm wondering if I can use the T5Block class in the injection_policy for the flan-t5 model as it's part of the same model family. I'm wondering how I can figure out whether this would or wouldn't work without just more or less blindly trying both out.

More generally, how can I find more information on the requirements of an injection_policy for models and verifying that the injection_policy actually makes sense?

I have read:

https://deepspeed.readthedocs.io/en/latest/inference-init.html
https://www.deepspeed.ai/tutorials/inference-tutorial/#initializing-for-inference

but wasn't able to find an answer to my question.

Jan 11 '23 21:01 ivo-1

I actually just saw that a PR is in progress on validating users' injection_policy: https://github.com/microsoft/DeepSpeed/pull/2630 ❤️

Jan 11 '23 21:01 ivo-1

Hi @ivo-1 , have you made any headway with this? I am interested in the same!

Mar 01 '23 19:03 alexcoca

Hi @ivo-1 and @alexcoca. I am also interested in figuring out the correct injection policy for Flan-T5!

Apr 03 '23 09:04 eusip

According to this auto tensor parallelism doc, t5 model should no longer need injection policy?

Apr 17 '23 23:04 brevity2021

Thanks @brevity2021 for the doc.

I want to use Flan-UL2 model, but didn't find that in neither supported / unsupported models.

Can anyone help figure out the correct injection_policy for Flan-UL2 ?

Jul 17 '23 06:07 aasthavar

Has anyone been able to determine the correct injection_policy for Flan-UL2, or is it confirmed whether this policy is supported or unsupported for Flan-UL2?

Sep 12 '23 07:09 vivek-media

DeepSpeed
DeepSpeed copied to clipboard

[QUESTION] How to figure out correct `injection_policy` for Flan-T5

DeepSpeed DeepSpeed copied to clipboard

[QUESTION] How to figure out correct `injection_policy` for Flan-T5

DeepSpeed
DeepSpeed copied to clipboard