Qinwen
Qinwen
# Description 1. to support different c4 variance version in c4_mperf datatype 2. add deepseek v3 convergence configs. 3. add deepseek v5p recipe # Tests validate #2 convergence run and...
# Description rename context_parallelism to context_autoregressive_parallelism to separate CP naming for inference The rest of the description includes relevant details and context, examples: - training and inference have different sharding...
# Description * add flag to enable tokamax attention scheduler for better overlap communication and compute. If the change fixes a bug or a Github issue, please include a link,...