DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

Enable autoTP for bloom

Open sywangyi opened this issue 1 year ago • 4 comments

sywangyi avatar Mar 16 '23 07:03 sywangyi

should work with https://github.com/huggingface/transformers/pull/22196

sywangyi avatar Mar 16 '23 07:03 sywangyi

@microsoft-github-policy-service agree [company="intel"]

sywangyi avatar Mar 16 '23 07:03 sywangyi

@microsoft-github-policy-service agree company=intel

sywangyi avatar Mar 16 '23 07:03 sywangyi

@delock @yao-matrix

sywangyi avatar Mar 16 '23 07:03 sywangyi

@RezaYazdaniAminabadi @jeffra @mrwyattii @awan-10 @cmikeh2 @arashb please help review. thanks

sywangyi avatar Mar 17 '23 06:03 sywangyi

https://github.com/huggingface/transformers/pull/22196 has been merged

sywangyi avatar Mar 17 '23 13:03 sywangyi

Hi @sywangyi, thanks for the PR. I tested this on my side and it looks good. We may just want to move these changes to another file since they are model-specific. @lekurile do you think we should move this to the Bloom container perhaps?

molly-smith avatar Mar 23 '23 17:03 molly-smith

Hi @sywangyi, thanks for the PR. I tested this on my side and it looks good. We may just want to move these changes to another file since they are model-specific. @lekurile do you think we should move this to the Bloom container perhaps?

Hi @molly-smith, I like the idea of moving this to be more model-specific. I don't know if the BLOOM container is necessarily the appropriate place for something like this, since the containers are mainly used for the replace_with_policy function and checkpoint loading for meta tensors.

However, with this PR, there seems to be a pattern emerging, at least for BLOOM models, where we do some pre-processing on the module (remove_mask_prepare_for_bloom(), build_bloom_alibi_tensor() functions) before proceeding with the core features of DS inference (kernel inject, MP, etc).

I'm thinking maybe we can have a more explicit pre-processing stage (e.g. self.module_pre_process() function call) where we handle these details. We can then try to follow the pattern of the containers, where we have an organized place for model-specific preprocessing function definitions, such as the aforementioned BLOOM functions. If we'd like to leverage the existing model containers, we'd have to expand their scope and perhaps think a bit more about their role/function.

@jeffra, @awan-10, @RezaYazdaniAminabadi Thoughts?

lekurile avatar Mar 23 '23 23:03 lekurile