buptzyb comments

Results 39 comments of


                                            buptzyb

feat(MoE): Refactor cuda_graph_scope

/ok to test ebb0e9d9b74a698abdcf7d99d01721fb859383ba

feat(MoE): Refactor cuda_graph_scope

/ok to test c07462349a86e33af9211ccb44a13ba832683be5

feat(MoE): Refactor cuda_graph_scope

/ok to test a35489f3ae2670f5b10dda306ead45b4cf814488

feat(MoE): Refactor cuda_graph_scope

> All my outstanding questions/comments re: cudagraphs are resolved. I will defer formal approval on behalf of @NVIDIA/inference in case anyone else wants to weigh in before merge. Thanks! @kvareddy...

feat(MoE): Refactor cuda_graph_scope

/ok to test 0337f2053bafd3affb50a56a0d01c8650141b98d

feat(MoE): Refactor cuda_graph_scope

Hi @rogerwaleffe @duncanriach @JRD971000 could you help review on behalf of NVIDIA/hybrid-mamba? Thanks!

feat(MoE): Refactor cuda_graph_scope

> If one wants to use per-layer cuda-graphs (--cuda-graph-scope full as of today in main), do we set --cuda-graph-scope as `attn mlp`? One option is to set `--cuda-graph-scope attn mlp`...

feat(MoE): Refactor cuda_graph_scope

> Thanks for the clarification. Is this behavior present for both the local and TE implementation or just for TE? Mcore inference solely uses the local implementation, hence my question....

feat(MoE): Refactor cuda_graph_scope

Hi @rogerwaleffe @duncanriach @JRD971000 could you help review on behalf of NVIDIA/hybrid-mamba? @kvareddy @santhnm2 could you help review? Thanks!

feat(MoE): Refactor cuda_graph_scope

/ok to test e82523232f24385d44b3ca656d8d297ba866cb07