DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

Dose autotp supports the multimodal training?

Open jclyu123-beep opened this issue 3 months ago • 1 comments

jclyu123-beep avatar Sep 11 '25 02:09 jclyu123-beep

Hi @jclyu123-beep , thanks for asking. Autotp analysis model architecture to figure out a model shards between cards. In this process autotp use module name (i.e. proj_q, proj_k) as a hint to know how to shard properly. So whether multimodal training is supported is model dependent. Sometimes new pattern matching rule might be needed if we want to support a new model. If you have a specific model you are working on, you can post the issue you encountered and we can discuss further.

Hi @inkcherry , is there any other thing needs to be noted?

delock avatar Sep 17 '25 00:09 delock