Zhipeng Zhang

Results 7 issues of Zhipeng Zhang

Dear LLaMA Teams, A huge thank you for making your remarkable work available to the public! I've taken a close look at the pretraining loss curves depicted in Figure 1...

Given that we aim to use mcore to do the training, we have a function to parse the args from Megatron-LM to mcore. Howover, the key of `output_layer_init_method ` is...

stale

Dear LLaMA Teams, A huge thank you for making your remarkable work available to the public! I've taken a close look at the pretraining loss curves depicted in Figure 1...

Hi, I've noticed that you have implemented that allows for the overlapping of computation and communication in tensor parallel operations. This is a significant enhancement that has the potential to...

enhancement

Hi, I've been exploring the impressive work you've done on incorporating FP8 GEMM to accelerate tensor matrix multiplication operations in TransformerEngine. The initiative is well-support by the findings in the...

After running ```sh cd hopper python setup.py install export PYTHONPATH=$PWD pytest -q -s test_flash_attn.py ``` I got the following assertion error: ``` FAILED test_flash_attn.py::test_flash_attn_output[257-1-128-False-False-mha-dtype0] - AssertionError: assert 0.0078125

**Describe the bug** The example code here [1] fails to run mnk=(1638, 6144, 3584) and `Got cutlass error: Invalid status at: 670`. **Steps/Code to reproduce bug** ``` cd cutlass/examples/57_hopper_grouped_gemm nvcc...

bug
? - Needs Triage