functorch
functorch copied to clipboard
Exception handling in linux binaries seem off
This only happens on one of my machines. It does not happen in our CI machines. Could just be a me-problem.
Repro:
import torch
from functorch import vmap
x = torch.randn(2, 3, 5)
vmap(lambda x: x, out_dims=3)(x)
Produces:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/private/home/rzou/local/miniconda3/envs/py39/lib/python3.9/site-packages/functorch/_src/vmap.py", line 366, in wrapped
return _unwrap_batched(batched_outputs, out_dims, vmap_level, batch_size, func)
File "/private/home/rzou/local/miniconda3/envs/py39/lib/python3.9/site-packages/functorch/_src/vmap.py", line 165, in _unwrap_batched
flat_outputs = [
File "/private/home/rzou/local/miniconda3/envs/py39/lib/python3.9/site-packages/functorch/_src/vmap.py", line 166, in <listcomp>
_remove_batch_dim(batched_output, vmap_level, batch_size, out_dim)
RuntimeError: Dimension out of range (expected to be in range of [-3, 2], but got 3)
Exception raised from maybe_wrap_dim_slow at ../c10/core/WrapDimMinimal.cpp:29 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f10a018e612 in /private/home/rzou/local/miniconda3/envs/py39/lib/python3.9/site-packa
ges/torch/lib/libc10.so)
frame #1: c10::detail::maybe_wrap_dim_slow(long, long, bool) + 0x3d3 (0x7f10a017c023 in /private/home/rzou/local/miniconda3/envs/py39/lib/python3.9/site-packa
ges/torch/lib/libc10.so)
frame #2: at::functorch::_remove_batch_dim(at::Tensor const&, long, long, long) + 0x5e8 (0x7f0ff6088678 in /private/home/rzou/local/miniconda3/envs/py39/lib/p
ython3.9/site-packages/functorch/_C.so)
frame #3: <unknown function> + 0x23b502 (0x7f0ff608c502 in /private/home/rzou/local/miniconda3/envs/py39/lib/python3.9/site-packages/functorch/_C.so)
frame #4: <unknown function> + 0x1ff6e2 (0x7f0ff60506e2 in /private/home/rzou/local/miniconda3/envs/py39/lib/python3.9/site-packages/functorch/_C.so)
<omitting python frames>
frame #27: __libc_start_main + 0xf3 (0x7f10f1ae70b3 in /lib/x86_64-linux-gnu/libc.so.6)
I would expect the error message to look like the following:
>>> vmap(lambda x: x, out_dims=3)(x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/private/home/rzou/functorch4/functorch/_src/vmap.py", line 361, in wrapped
return _flat_vmap(
File "/private/home/rzou/functorch4/functorch/_src/vmap.py", line 488, in _flat_vmap
return _unwrap_batched(batched_outputs, out_dims, vmap_level, batch_size, func)
File "/private/home/rzou/functorch4/functorch/_src/vmap.py", line 165, in _unwrap_batched
flat_outputs = [
File "/private/home/rzou/functorch4/functorch/_src/vmap.py", line 166, in <listcomp>
_remove_batch_dim(batched_output, vmap_level, batch_size, out_dim)
IndexError: Dimension out of range (expected to be in range of [-3, 2], but got 3)
Looks like a linker compatibility problem (i.e. when one c++ runtime does not know how to talk to another one or how to parse unwind instructions)
Though it works for me on Ubuntu-18.04, by running the following commands:
$ conda create -n py38-torch112-cpu python=3.8
$ conda activate py38-torch112-cpu
$ python3 -mpip install --pre torch==1.12 -f https://download.pytorch.org/whl/test/cpu/torch_test.html
$ pip install functorch-0.2.0-cp38-cp38-linux_x86_64.whl
$ python
Python 3.8.13 (default, Mar 28 2022, 11:38:47)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
from f>>> from functorch import vmap
>>> x=torch.rand(2, 3, 5)
<stdin>:1: UserWarning: Failed to initialize NumPy: numpy.core.multiarray failed to import (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:68.)
>>> vmap(lambda x: x, out_dims=3)(x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/fsx/users/nshulga/conda/envs/py38-torch112-cpu/lib/python3.8/site-packages/functorch/_src/vmap.py", line 366, in wrapped
return _unwrap_batched(batched_outputs, out_dims, vmap_level, batch_size, func)
File "/fsx/users/nshulga/conda/envs/py38-torch112-cpu/lib/python3.8/site-packages/functorch/_src/vmap.py", line 165, in _unwrap_batched
flat_outputs = [
File "/fsx/users/nshulga/conda/envs/py38-torch112-cpu/lib/python3.8/site-packages/functorch/_src/vmap.py", line 166, in <listcomp>
_remove_batch_dim(batched_output, vmap_level, batch_size, out_dim)
IndexError: Dimension out of range (expected to be in range of [-3, 2], but got 3)
There's also a related problem (which was discussed offline): the PyTorch cu102 binaries don't include the _ZNSt19basic_ostringstreamIcSt11char_traitsIcESaIcEEC1Ev symbol, but the PyTorch cpu/cu113/cu116 binaries do. On some systems, libstdc++.so.6 doesn't actually include this, so this leads to a "symbol missing error" on import functorch
Just to clarify:
$ c++filt _ZNSt19basic_ostringstreamIcSt11char_traitsIcESaIcEEC1Ev
std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >::basic_ostringstream()