Yan Xu

Results 84 issues of Yan Xu

To enable BladeDISC compilable, the input and output should be Tensor type, It works well in TensorFlow world, but insufficient in PyTorch world, because a considerable number of inputs/outputs is...

need discussion

The CI system can skip the build and test step if a pull request contains markdown files only.

CI

This PR add some op shape analysis, to make this pass more stable, better to add a unit test to check the static shape and dynamic shape for a new...

add scalar-reduction codegen template , the algorithm comes from https://developer.download.nvidia.com/assets/cuda/files/reduction.pdf

To optimize distributed training graph (DP, FSDP), DISC needs to support collective ops as a preliminary preparation - [ ] support collective ops compilation and execution (all_reduce, all_gather, reduce_scatter) @Yancey1989...