xla
xla copied to clipboard
Head is broken with `kl_div_backward`
🐛 Bug
To Reproduce
Build PyTorch and PyTorch/XLA both from head.
Update:
real error is kl_div_backward
in https://github.com/pytorch/xla/issues/3682#issuecomment-1174627323
error message ~~Building torch_xla version: 1.13~~ ~~XLA Commit ID: 6a03a5dcf6c0c057577a3b9742840040a030298a ~~ ~~PyTorch Commit ID: c78bdd7ba04ae514a46258083c67f165608f3d27 ~~ ~~Traceback (most recent call last):~~ ~~ File "/home/jackcao/pytorch/xla/scripts/gen_lazy_tensor.py", line 58, in~~ ~~ run_gen_lazy_tensor(~~ ~~ File "/usr/local/lib/python3.8/dist-packages/torchgen/gen_lazy_tensor.py", line 302, in run_gen_lazy_tensor~~ ~~ parsed_yaml = parse_native_yaml(native_yaml_path, tags_yaml_path)~~ ~~ File "/usr/local/lib/python3.8/dist-packages/torchgen/gen.py", line 237, in parse_native_yaml~~ ~~ _GLOBAL_PARSE_NATIVE_YAML_CACHE[path] = parse_native_yaml_struct(~~ ~~ File "/usr/local/lib/python3.8/dist-packages/torchgen/gen.py", line 174, in parse_native_yaml_struct~~ ~~ func, m = NativeFunction.from_yaml(e, loc, valid_tags, ignore_keys)~~ ~~ File "/usr/local/lib/python3.8/dist-packages/torchgen/model.py", line 548, in from_yaml~~ ~~ dispatch_key = DispatchKey.parse(k.strip())~~ ~~ File "/usr/local/lib/python3.8/dist-packages/torchgen/model.py", line 142, in parse~~ ~~ raise AssertionError(f"unknown dispatch key {value}")~~ ~~AssertionError: unknown dispatch key CompositeExplicitAutogradNonFunctional~~ ~~ in /home/jackcao/pytorch/aten/src/ATen/native/native_functions.yaml:761:~~ ~~ as_strided_(Tensor(a!) self, int[] size, int[] stride, int? storage_offset=None) -> Tensor(a!)~~ ~~ Failed to generate lazy files: ['python', '/home/jackcao/pytorch/xla/scripts/gen_lazy_tensor.py']~~
Hmm.. I check the native_function.yaml and I think the real error is from https://github.com/pytorch/pytorch/pull/80334. Need to verify locally. Yesterday's wheel build succeed so I am sure this is due to some change merge to PyTorch today.
weird.. even if I sync pytorch before https://github.com/pytorch/pytorch/pull/80334, I still see the build error. I don't see any change to the as_strided
so this is confusing.
I think as_strided
issue is something else, it only happens on TPUVM. on ci and my other build machine I see
(pytorch) root@dbc27111d843:/pytorch/xla# python setup.py install
Building torch_xla version: 1.13
XLA Commit ID: 6a03a5dcf6c0c057577a3b9742840040a030298a
PyTorch Commit ID: c78bdd7ba04ae514a46258083c67f165608f3d27
Traceback (most recent call last):
File "/pytorch/xla/scripts/gen_lazy_tensor.py", line 83, in <module>
get_device_fn="torch_xla::bridge::GetXlaDevice")
File "/root/anaconda3/envs/pytorch/lib/python3.7/site-packages/torchgen/gen_lazy_tensor.py", line 357, in run_gen_lazy_tensor
source_yaml, grouped_native_functions, backend_indices
File "/root/anaconda3/envs/pytorch/lib/python3.7/site-packages/torchgen/gen_backend_stubs.py", line 149, in parse_backend_yaml
use_device_guard=use_device_guard,
File "/root/anaconda3/envs/pytorch/lib/python3.7/site-packages/torchgen/gen_backend_stubs.py", line 122, in create_backend_index
), f"Found an invalid operator name: {op_name}"
AssertionError: Found an invalid operator name: kl_div_backward
Failed to generate lazy files: ['python', '/pytorch/xla/scripts/gen_lazy_tensor.py']
on head which should be solved by https://github.com/pytorch/xla/pull/3683
Ah OK, I think CompositeExplicitAutogradNonFunctional
issue is related to I have torch 1.12
preinstalled on TPUVM. The real build error is related to kl_div_backward