DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

Enabled configurable auto Tensor Parallelism (TP) for the inference of diverse models

Open gyou2021 opened this issue 1 year ago • 3 comments

Auto TP in auto_tp.py needs to handle linear type modules in emerging complex models. 1) The result of some linear modules in a model should operate all reduce operation after running on multiple HPU/GPU cards; The name of those linear modules may be different from those in the method tp_parser(). 2) The weight of some linear modules in a model CANNOT be split to multiple HPU/GPU cards; 3) The weight of some linear modules in a model should NOT be split to multiple HPU/GPU cards to avoid decreasing performance because of afterward all gather operation (gather result from all cards). In case 1) the Linear type should change to AllReduceLinear type in DeepSpeed. In case 2) and 3) the linear modules should keep Linear type. To handle those cases easily, the configurable auto TP was proposed. The method tp_parser() will add the linear modules in case 1) (Here module name list was stored in the environment variable 'allReduceLinearItems') and the method _replace_module() will add the linear modules in case 2) and 3) (Here module name list was stored in the environment variable 'keepLinearItems'). Those environment variables are configurable. They can be configured in environment variables directly or in a configuration file.

gyou2021 avatar Sep 18 '24 12:09 gyou2021

Hi @gyou2021 I like the goal to avoid repetition of same logic from L296 to L315, but I also have concern that models enabled by these lines will not be able to run out-of-box with this PR. This may not be friendly to self-helping users without access to proper BKC documentation to various models.

Could allReduceLinearItems have an initial value as a built-in list, then pre-pend with os.environment to get runtime configurability? I think if the model to be enabled by environment is a public model, it should be contributed to the built-in list to provide OOB experience, right?

delock avatar Sep 19 '24 01:09 delock

@microsoft-github-policy-service agree @microsoft-github-policy-service agree company="Intel"

gyou2021 avatar Sep 29 '24 06:09 gyou2021

Hi @delock and @gyou2021 - what more needs to be done to complete this PR? Just a review/approval? Any other changes?

loadams avatar Jan 08 '25 19:01 loadams

@loadams let me check with gyou on this PR status.

delock avatar Jan 14 '25 06:01 delock

Sure. I updated the code to enable it to run out-of-box. Thank you for your comments.

Hi @gyou2021 I like the goal to avoid repetition of same logic from L296 to L315, but I also have concern that models enabled by these lines will not be able to run out-of-box with this PR. This may not be friendly to self-helping users without access to proper BKC documentation to various models.

Could allReduceLinearItems have an initial value as a built-in list, then pre-pend with os.environment to get runtime configurability? I think if the model to be enabled by environment is a public model, it should be contributed to the built-in list to provide OOB experience, right?

gyou2021 avatar Jan 20 '25 06:01 gyou2021

@loadams my questions are all resolved and I have no further question to @gyou2021 , thanks!

delock avatar Jan 21 '25 09:01 delock

Hi @loadams , is this PR under review? Thanks!

delock avatar Feb 20 '25 13:02 delock

Hi @loadams , is this PR under review? Thanks!

Hi @delock - sorry for the delay in this, we will work on getting this reviewed.

loadams avatar Feb 20 '25 15:02 loadams