buptzyb comments

Results 39 comments of


                                            buptzyb

Allow merging compute-copy streams

Thanks for the review, Changhui! Improved the coding style according to your advice.

Allow merging compute-copy streams

Hi @changhuilin , what's our next step on this PR? Thank you!

Allow merging compute-copy streams

@changhuilin Thank you for taking care of this PR! Updated the description and merged to the head.

Allow merging compute-copy streams

@changhuilin So, what's our next move on this?

[BUG]cuda-graph-scope attn and external-cuda-graph

As of today, `--external-cuda-graph` must go with `--te-rng-tracker`. I suspect your phase 3 error is still an OOM-caused strange behavior. Could you make some mini tests first such as running...

[BUG]cuda-graph-scope attn and external-cuda-graph

These are my arguments running 8*7b cudagraph. But I tested with 4 nodes: `--position-embedding-type rope --normalization RMSNorm --swiglu --no-position-embedding --no-masked-softmax-fusion --tokenizer-type Llama2Tokenizer --tokenizer-model xxxxx/mixtral-tokenizer.model --ffn-hidden-size 14336 --group-query-attention --num-query-groups 8 --num-layers...

[BUG]cuda-graph-scope attn and external-cuda-graph

Correct, you need to pass `io_memory_reduction = True` to [make_graphed_callables](https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/training/training.py#L630) to enable it. Your error seems so weird, I cannot think of a reason why the old and new data...