Chi Zhang
Chi Zhang
### Versions - [ ] Python version: - [ ] Python architecture: - [ ] Operating system and version: - [ ] OpenDSSDirect.py version number: ### Feature Request ### Bug...
Just wonder does the current PipelineStage API supports variable length input shapes like in Megatron? https://github.com/NVIDIA/Megatron-LM/blob/e33c8f78a35765d5aa37475a144da60e8a2349d1/megatron/core/model_parallel_config.py#L212 This is particular useful for packed inputs where all the paddings are removed.
We are currently trying to apply torchtitan to MoE models. MoE models require using grouped_gemm https://github.com/fanshiqing/grouped_gemm. GroupedGemm ops basically follow the same rule as in ColumnLinear and RowLinear. Is there...
We should make them mutually exclusive by using assertion in config