DeepSpeed
DeepSpeed copied to clipboard
Dose autotp supports the multimodal training?
Hi @jclyu123-beep , thanks for asking. Autotp analysis model architecture to figure out a model shards between cards. In this process autotp use module name (i.e. proj_q, proj_k) as a hint to know how to shard properly. So whether multimodal training is supported is model dependent. Sometimes new pattern matching rule might be needed if we want to support a new model. If you have a specific model you are working on, you can post the issue you encountered and we can discuss further.
Hi @inkcherry , is there any other thing needs to be noted?