Frost Mitchell

Results 4 issues of Frost Mitchell

This PR adds the Cpp template for BMM, for FP32, FP16, and BF16. See #125683 for more background. 1. Adds `CppBmmTemplate` class which inherits from `CppPackedGemmTemplate`. Given a number of...

oncall: distributed
module: cpu
module: mkldnn
open source
module: amp (automated mixed precision)
NNC
ciflow/trunk
release notes: quantization
release notes: releng
module: inductor
module: dynamo
ciflow/inductor
module: distributed_checkpoint

### 🐛 Describe the bug When using the Pytorch profiler, the current doc page shows a method of profiling on accelerators by including both `ProfilerActivity.CUDA` and `ProfilerActivity.XPU` in the activities...

oncall: profiler
module: xpu

Part of #154898 Update kineto submodule Summary: We add the toggleCollectionDynamic functionality to XPUPTI in Kineto, so profiler can be enabled/disabled dynamically.

open source
ciflow/trunk
topic: not user facing
ciflow/xpu

This PR enables autocast for ops that do not yet have autocast support (`roi_pool`, `ps_roi_align`, `ps_roi_pool`, `deform_conv2d`) Tests have been updated to check autocast on CPU. In regards to the...

cla signed