Results 3 issues of Qinwen

# Description 1. to support different c4 variance version in c4_mperf datatype 2. add deepseek v3 convergence configs. 3. add deepseek v5p recipe # Tests validate #2 convergence run and...

pull ready

# Description rename context_parallelism to context_autoregressive_parallelism to separate CP naming for inference The rest of the description includes relevant details and context, examples: - training and inference have different sharding...

stale

# Description * add flag to enable tokamax attention scheduler for better overlap communication and compute. If the change fixes a bug or a Github issue, please include a link,...