xla
xla copied to clipboard
`tensor.shape` is not yet implemented for dynamic tensors
The following test fails on Dynamic Shape ops. This is mostly because XLASymIntNodeImpl
doesn't support ToString()
>>> a2.shape
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: RuntimeError: NYI
In the above test, a
is a dynamic tensor
CC @vanbasten23
In response to the first comment, at https://github.com/pytorch/pytorch/blob/d39e9c1e9087069fa774b0e3eb47e04750edca88/c10/core/SymIntNodeImpl.h#L85, I changed to a more specific error string, such as "str() NYI". Then I rebuilt pytorch and run the commands:
>>> a1 = torch.tensor([[1,0,0,5,0,6]], device=dev)
>>> a2 = torch.nonzero(a1)
>>> a2.shape
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: RuntimeError: NYI
That suggest implementing the str() for XLASymIntNodeImpl may not be enough. Do you think we need to register the python dispatcher to the XLA key as we discussed in today's meeting? @Krovatkin Also, Will Constable mentioned he would shared the instruction on how to register python dispatcher to the XLA key. Do know where I can find the instruction?
Okay, I got the c++ stacktrace:
(pytorch) root@t1v-n-cf794107-w-0:/# python3
Python 3.8.8 (default, Apr 13 2021, 19:58:26)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch, torch_xla, torch_xla.core.xla_model as xm
>>> dev = xm.xla_device()
>>> a1 = torch.tensor([[1,0,0,5,0,6]], device=dev)
>>> a2 = torch.nonzero(a1)
>>> a2.shape
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
import torch, torch_xla
RuntimeError: RuntimeError: NYI
Exception raised from str at /pytorch/c10/core/SymIntNodeImpl.h:83 (most recent call first):
import torch, torch_xla
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x7d (0x7f27d86ea23d in /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0xdd (0x7f27d86e895d in /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: torch_xla::XLATensor::~XLATensor() + 0 (0x7f27d0920950 in /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/_XLAC.cpython-38-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x89002a (0x7f27e3c1302a in /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0x23436c (0x7f27e35b736c in /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #11: <unknown function> + 0x761490 (0x7f27e3ae4490 in /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #12: <unknown function> + 0x78e0e6 (0x7f27e3b110e6 in /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #29: __libc_start_main + 0xeb (0x7f27ed76f09b in /lib/x86_64-linux-gnu/libc.so.6)
>>>
I'm confused. In my local /pytorch/c10/core/SymIntNodeImpl.h, I changed virtual std::string str()
to
virtual std::string str() {
TORCH_CHECK(false, "C10_API SymIntNodeImpl str() is NYI");
};
, then built it via python setup.py install
under pytorch/, then went back to HOME
, run the above python command. Wouldn't it pick up my local change?
@vanbasten23 @miladm
I'm not quite seeing what you guys both are seeing in https://github.com/pytorch/xla/pull/4073
the test included in the PR prints :
__str__ BEGIN
__str__ END
XLASymIntNodeImpl
after printing shape
which seems to indicate that we are indeed hitting the XLASymIntNodeImpl implementation?
I'm using the following configuration options:
export XLA_EXPERIMENTAL="nonzero:masked_select"
export XRT_WORKERS="localservice:0;grpc://localhost:40934"
export XRT_DEVICE_MAP="CPU:0;/job:localservice/replica:0/task:0/device:XLA_CPU:0"
@vanbasten23 @miladm
Added an example of printing a static value of a DimensionNode
here: https://github.com/pytorch/xla/pull/4073/commits/1ff4bae6b9fb32ecbf4f72a05dd4e513f0a68e60
This is the output:
(pytorch) root@9471890a681a:/home/pytorch/xla/test# python test_str.py
__str__ BEGIN
__str__ END
IR=SizeNode, static=6
after printing shape
, then built it via python setup.py install under pytorch/, then went back to HOME, run the above python command. Wouldn't it pick up my local change?
This workflow works perfectly for me. I actually added in
__str__ BEGIN
__str__ END
in pytorch bindings for jit/python/init.cpp
and rebuilt pytorch with python setup.py install
For this kind of a change, I didn't need to rebuild xla but to be on the safe side you could rebuild XLA as well.
I have a few theories:
- when you run
python setup.py install
, it didn't build succesfully? -
python setup.py install
was accidentally run in the wrong folder - for some reason
python setup.py install
didn't overwrite the existing binary package. You could try doingpip uninstall pytorch
first and then doingpython setup.py install
Also you could turn your changes in pytorch into a commit e.g. git add -u, git commit -m "XX" and when you load pytorch you could print torch.version and double check it matches your commit. This way you can be sure you are using the right version of pytorch
The bug is fixed. closing.