Brian Hirsh
Brian Hirsh
Just blatting in the contens of https://github.com/pytorch/pytorch/pull/123880 to confirm that this is the only remaining problem preventing green CI on the subsequent PR Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): *...
Repro below. It looks like even though `dtype=torch.bfloat16` is specified in the autocast context manager, the intermediates in the backward are getting cast down to `torch.float16` before matmul is invoked...
I noticed that when compiling a small microbenchmark (with inductor warm caching), E2E compile times were ~4s with cuda tensors and ~15s with cpu tensors. It looks like the majority...
Fixes https://github.com/pytorch/pytorch/issues/141149. `aten.copy_` supports numbers as tensors in the python arg parser. So we need to give the same treatment to `aten.copy`. Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__...