Yan Xu
Yan Xu
To enable BladeDISC compilable, the input and output should be Tensor type, It works well in TensorFlow world, but insufficient in PyTorch world, because a considerable number of inputs/outputs is...
The CI system can skip the build and test step if a pull request contains markdown files only.
This PR add some op shape analysis, to make this pass more stable, better to add a unit test to check the static shape and dynamic shape for a new...
add scalar-reduction codegen template , the algorithm comes from https://developer.download.nvidia.com/assets/cuda/files/reduction.pdf
To optimize distributed training graph (DP, FSDP), DISC needs to support collective ops as a preliminary preparation - [ ] support collective ops compilation and execution (all_reduce, all_gather, reduce_scatter) @Yancey1989...