eellison
eellison
Adds a fake tensor, which augments meta tensors with a device, and does device propagation on operators. Miscellaneous notes: - I still need to add `FakeMode` which will cover constructors....
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #125780 * #125773 * #125772 For mm inputs which are not inputs of the graph, assume that we can memory plan them...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #125780 * __->__ #125773 * #125772 Relanding just the pad in a single pass portion of [the pr](https://github.com/pytorch/pytorch/pull/118522). Not including the transpose logic:...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #125455 cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang...
### 🐛 Describe the bug ``` import torch @torch.compile(mode="max-autotune") def foo(x, y): return x @ y x = torch.empty_strided((50257, 32768), ((1, 50304)), dtype=torch.bfloat16, device='cuda') y = torch.empty_strided((32768, 768), (768, 1),...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #126570 * __->__ #126560 This adds logging that will mark any invocation of a matmul for a particular input shapes, and record every...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #126570 * #126560 We had a previous PR that added configs for an internal model. Running the below script on output from...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #120278 * #121998 * #121997 * #120275 * #121996 cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy...
Within PyTorch torchinductor we are JIT compiling many triton functions, often 100+. We currently have a mechanism that will initialize a [pool of forked processes](https://github.com/pytorch/pytorch/blob/745d29b0cc0502594ab196057fd4f1bad36ebc4a/torch/_inductor/codecache.py#L1215) in order to parallelize triton...
### 🚀 The feature, motivation and pitch Saved activations that are unaliased should be annotated as donated buffers to inductor. Donated buffers are buffers which will be considered dead after...