Frost Mitchell
Frost Mitchell
This PR adds the Cpp template for BMM, for FP32, FP16, and BF16. See #125683 for more background. 1. Adds `CppBmmTemplate` class which inherits from `CppPackedGemmTemplate`. Given a number of...
### 🐛 Describe the bug When using the Pytorch profiler, the current doc page shows a method of profiling on accelerators by including both `ProfilerActivity.CUDA` and `ProfilerActivity.XPU` in the activities...
Part of #154898 Update kineto submodule Summary: We add the toggleCollectionDynamic functionality to XPUPTI in Kineto, so profiler can be enabled/disabled dynamically.
This PR enables autocast for ops that do not yet have autocast support (`roi_pool`, `ps_roi_align`, `ps_roi_pool`, `deform_conv2d`) Tests have been updated to check autocast on CPU. In regards to the...