pytorch
pytorch copied to clipboard
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #83050 AOTAutograd retraces graph module produced by torch dynamo, this PR preserves the stack trace in the original fx.Node.
This PR implements an APEX style FusedAdam in PyTorch. This is different from the APEX one in that this is compatible with `torch.cuda.amp.GradScaler` by setting `_step_supports_amp_scaling` to `True` and unscales...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #82841 * #82839 * #82837 * __->__ #82836
## Description This PR improves performance of quantized kernel for normalize by vectorizing scalar remainder. In the current implementation [here](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp), the computation is vectorized while the scalar remainder is handled...
Summary: Currently `SelectiveBuilder` is hardcoding namespace `aten` for operators. This is not working anymore since operators started to have custom namespaces. This fixes it. Test Plan: Rely on newly added...
### 🐛 Describe the bug Using MPS for BERT inference appears to produce about a 2x slowdown compared to the CPU. Here is code to reproduce the issue: ```python #...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #83137 * #83122 * #82874
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #82841 * __->__ #82839 * #82837 * #82836
This PR proposes a list of CPU-related PyTorch modules that Intel is willing to own or co-own.
* Fixes #78611 Reshape tensors witch are channels_last will get unexpected stride. * Fixes empty input convolution issue : when input is empty e.g. shape of (0, 3, 3, 4)...