JackCaoG
JackCaoG
## 🐛 Bug Steps to reproduce the behavior: run ``` # save this file as "debug_bf16.py" import torch import torch_xla.core.xla_model as xm cast_after_init = True # cast_after_init = False device...
## 🚀 Feature Currently PyTorch/XLA uses `xla::shape` all over the place. Common use of `xla::shape` would be to get the number of elements of a tensor, compare shape equally between...
## 🐛 Bug ## To Reproduce Build PyTorch and PyTorch/XLA both from head. Update: real error is `kl_div_backward` in https://github.com/pytorch/xla/issues/3682#issuecomment-1174627323 error message ~~Building torch_xla version: 1.13~~ ~~XLA Commit ID: 6a03a5dcf6c0c057577a3b9742840040a030298a...
Today if we have a bunch of DeviceData IR and called `xm.mark_step`, we will execute a graph looks like ``` # ENTRY %IrToHlo.4 (p0.1: f32[4,2,2], p1.2: f32[4,2,2]) -> (f32[4,2,2], f32[4,2,2])...
### 🚀 The feature, motivation and pitch `torch.compile` can takes order of seconds to compile a decent size model like Llama2 7B with a `aot-autogra` enabled backend. Note that I...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #125808 cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @gujinghui @PenghuiCheng @jianyuh @min-jean-cho @yanbing-j @Guobing-Chen @Xia-Weiwen @snadampal @voznesenskym @penguinwu @EikanWang @zhuhaozhe @blzheng @wenzhe-nrv...