Yukun He
Yukun He
Unify the two versions of AllReduce op in Module and custome op levels.
* Add a specific environment variable to control the logger level of AutoTuner. * Add statistics to track the total profiling time for each op. This will help determine the...
We find that release/1.1 also has this issue and may have a potential perf drop. ## Summary by CodeRabbit * **Refactor** * Optimized internal tensor allocation for NVFP4 uint8 operations...
@coderabbitai summary ## Description ## Test Coverage ## PR Checklist Please review the following before submitting your PR: - PR description clearly explains what and why. If using CodeRabbit's summary,...