DeepSpeed
DeepSpeed copied to clipboard
Enable autoTP for bloom
should work with https://github.com/huggingface/transformers/pull/22196
@microsoft-github-policy-service agree [company="intel"]
@microsoft-github-policy-service agree company=intel
@delock @yao-matrix
@RezaYazdaniAminabadi @jeffra @mrwyattii @awan-10 @cmikeh2 @arashb please help review. thanks
https://github.com/huggingface/transformers/pull/22196 has been merged
Hi @sywangyi, thanks for the PR. I tested this on my side and it looks good. We may just want to move these changes to another file since they are model-specific. @lekurile do you think we should move this to the Bloom container perhaps?
Hi @sywangyi, thanks for the PR. I tested this on my side and it looks good. We may just want to move these changes to another file since they are model-specific. @lekurile do you think we should move this to the Bloom container perhaps?
Hi @molly-smith, I like the idea of moving this to be more model-specific. I don't know if the BLOOM container is necessarily the appropriate place for something like this, since the containers are mainly used for the replace_with_policy
function and checkpoint loading for meta tensors.
However, with this PR, there seems to be a pattern emerging, at least for BLOOM models, where we do some pre-processing on the module
(remove_mask_prepare_for_bloom()
, build_bloom_alibi_tensor()
functions) before proceeding with the core features of DS inference (kernel inject, MP, etc).
I'm thinking maybe we can have a more explicit pre-processing stage (e.g. self.module_pre_process()
function call) where we handle these details. We can then try to follow the pattern of the containers, where we have an organized place for model-specific preprocessing function definitions, such as the aforementioned BLOOM functions. If we'd like to leverage the existing model containers, we'd have to expand their scope and perhaps think a bit more about their role/function.
@jeffra, @awan-10, @RezaYazdaniAminabadi Thoughts?