ColossalAI
ColossalAI copied to clipboard
[FEATURE]: tensor parallel microbenchmark changes to support microbenchmarking large models
Describe the feature
Problem The intrahost microbenchmarking CLI tool executes the "None" (DDP) strategy first, and when it OOMs, the microbenchmark does not proceed to the tensor parallel strategies.
Desired solution/support Intrahost tensor parallelism is most important for large models, relative to available memory, so the tensor parallel microbenchmark would ideally support benchmarking model sizes that do not fit in a single GPU's memory.
Potential fixes (not mutually exclusive)
- Use try-catch clauses to attempt all strategies even if some fail with errors.
- Reorder of the strategy execution order.
- Parameterize the strategy selection and potentially also the sequence of strategies executed.