Xin Yao comments

Results 123 comments of


                                            Xin Yao

[PyTorch] Let `GroupedLinear` accept MXFP8 input and gradient

/te-ci

[PyTorch] Let `GroupedLinear` accept MXFP8 input and gradient

/te-ci

[PyTorch] Let `GroupedLinear` accept MXFP8 input and gradient

Ready for review. The CI failures are irrelevant. @nvMelissa @timmoon10 @zhongbozhu

No dot product attention backend is available for the provided inputs

From the log, the Flash Attention backend is disabled because you set `NVTE_FLASH_ATTN=0`, while the cuDNN attention backend is disabled because input is not supported. So a quick fix is...

[Pytorch] change fused cross entropy backward grad to fp32 and reduce one read/…

@sanandaraj5597 @timmoon10 Could you please review? The previous BF16 backward may lead to divergence in some cases (reported by several customers).

[Pytorch] change fused cross entropy backward grad to fp32 and reduce one read/…

@RandMist You need to sign-off your commits (`git commit -s`). See [this](https://github.com/NVIDIA/TransformerEngine/pull/2325/checks?check_run_id=54164971208).

[Common][PyTorch][Rework] PDL for Quantization

/te-ci

[Common][PyTorch][Rework] PDL for Quantization

/te-ci

[Common][PyTorch][Rework] PDL for Quantization

> My understanding is that we have control of both edges between kernels, we can modify the launch of the current kernel with `cudaLaunchKernelEx` and we can modify if the...

[Common][PyTorch][Rework] PDL for Quantization

/te-ci