Is CP support with MTP now?
If CP does not support MTP, how to train long-context with MTP models?
https://github.com/NVIDIA/Megatron-LM/blob/bed7dbd1676f785401779291630dbc0002e0f618/megatron/training/arguments.py#L1006
It's not supported yet. We are currently developing this feature and it's estimated to be released in MCore v0.13. Tensor parallelism can be used for now to reduce the activation memory. But the performance may be not optimal.
Hi @Victarry, when will MCore v0.13 be released?
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.