torchmd-net icon indicating copy to clipboard operation
torchmd-net copied to clipboard

bad_alloc using PyTorch 1.12

Open peastman opened this issue 2 years ago • 6 comments

I have a model created with TorchMD-Net. I want to use it for running a simulation in OpenMM. That involves compiling to TorchScript, saving to a file, and loading it with the PyTorch C++ API. When I try to do that, it crashes with a bad_alloc down inside libtorch.

Is this expected to work? Or do some of the packages like pyg and torch-cluster not support that workflow? If it's known not to work right now, what would need to happen to make it work?

peastman avatar Sep 15 '22 23:09 peastman

I haven't tried using TorchMD-Net in C++ so I don't know. You could try breaking the model down and exporting submodules to narrow down the issue. Maybe also trying just a small message passing pyg example to see if that's the issue. The implementation uses rather basic PyTorch functionalities except pyg's message passing implementation and custom kernels.

PhilippThoelke avatar Sep 16 '22 00:09 PhilippThoelke

For what it's worth, gdb shows the error happens inside torch_cluster.

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff6e567f1 in __GI_abort () at abort.c:79
#2  0x00007fffc228e036 in __gnu_cxx::__verbose_terminate_handler ()
    at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x00007fffc228c524 in __cxxabiv1::__terminate (handler=<optimized out>)
    at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
#4  0x00007fffc228c576 in std::terminate () at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:58
#5  0x00007fffc228c768 in __cxxabiv1::__cxa_throw (obj=0x5555597a3f50, 
    tinfo=0x7fffc2380278 <typeinfo for std::bad_alloc>, 
    dest=0x7fffc228b0e4 <std::bad_alloc::~bad_alloc()>)
    at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:98
#6  0x00007fffc228cb95 in operator new (sz=140734623278748)
    at /home/conda/feedstock_root/build_artifacts/gcc_compilers_1652324151713/work/build/x86_64-conda-linux-gnu/libstdc++-v3/libsupc++/new:64
#7  0x00007fffb9dc442e in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*> ()
   from /home/peastman/miniconda3/envs/torchmd-net/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#8  0x00007fffb9dc52e6 in c10::RegisterOperators::inferSchemaFromKernels_(c10::OperatorName const&, c10::RegisterOperators::Options const&) ()
   from /home/peastman/miniconda3/envs/torchmd-net/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#9  0x00007fffb9dc8152 in c10::RegisterOperators::checkSchemaAndRegisterOp_(c10::RegisterOperators::Options&&) ()
   from /home/peastman/miniconda3/envs/torchmd-net/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#10 0x00007fff553a469d in std::enable_if<c10::guts::is_function_type<long ()>::value&&(!std::is_same<long (), void (c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*)>::value), c10::RegisterOperators&&>::type c10::RegisterOperators::op<long ()>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long (*)(), c10::RegisterOperators::Options&&) && ()
   from /home/peastman/miniconda3/envs/torchmd-net/lib/python3.9/site-packages/torch_cluster/_version_cuda.so
#11 0x00007fff553a217d in ?? ()
   from /home/peastman/miniconda3/envs/torchmd-net/lib/python3.9/site-packages/torch_cluster/_version_cuda.so
#12 0x00007ffff7de38d3 in call_init (env=0x555557f25be0, argv=0x7fffffffbb48, argc=2, 
    l=<optimized out>) at dl-init.c:72
#13 _dl_init (main_map=main_map@entry=0x5555598792d0, argc=2, argv=0x7fffffffbb48, 
    env=0x555557f25be0) at dl-init.c:119
#14 0x00007ffff7de839f in dl_open_worker (a=a@entry=0x7fffffff1a70) at dl-open.c:522
#15 0x00007ffff6f7d16f in __GI__dl_catch_exception (exception=0x7fffffff1a50, 
    operate=0x7ffff7de7f60 <dl_open_worker>, args=0x7fffffff1a70) at dl-error-skeleton.c:196
#16 0x00007ffff7de796a in _dl_open (
    file=0x7fff553b9440 "/home/peastman/miniconda3/envs/torchmd-net/lib/python3.9/site-packages/torch_cluster/_version_cuda.so", mode=-2147483646, caller_dlopen=0x7ffff7e7df3e <py_dl_open+142>, 
    nsid=<optimized out>, argc=2, argv=<optimized out>, env=0x555557f25be0) at dl-open.c:605

(continuing on up to stack frame #491)

peastman avatar Sep 16 '22 00:09 peastman

Sorry, it looks like I misdiagnosed what the problem is. The error actually occurs as soon as I import torchmdnet, and it's caused by upgrading to PyTorch 1.12. I create an environment like this:

mamba env create -f environment.yml
conda activate torchmd-net
pip install -e .

At that point things work correctly. So now upgrade PyTorch:

mamba install -c conda-forge pytorch=1.12

and execute the command

python -c "from torchmdnet.models.model import load_model"

It fails with

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted

This is on Ubuntu 20.04. Here's the complete environment.

# packages in environment at /home/peastman/miniconda3/envs/torchmd-net:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
absl-py                   1.2.0              pyhd8ed1ab_0    conda-forge
aiohttp                   3.8.3            py39hb9d737c_0    conda-forge
aiosignal                 1.2.0              pyhd8ed1ab_0    conda-forge
alsa-lib                  1.2.3.2              h166bdaf_0    conda-forge
async-timeout             4.0.2              pyhd8ed1ab_0    conda-forge
attrs                     22.1.0             pyh71513ae_1    conda-forge
blinker                   1.4                        py_1    conda-forge
brotli                    1.0.9                h166bdaf_7    conda-forge
brotli-bin                1.0.9                h166bdaf_7    conda-forge
brotlipy                  0.7.0           py39hb9d737c_1004    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.18.1               h7f98852_0    conda-forge
ca-certificates           2022.9.14            ha878542_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
cachetools                5.2.0              pyhd8ed1ab_0    conda-forge
certifi                   2022.9.14          pyhd8ed1ab_0    conda-forge
cffi                      1.15.1           py39he91dace_0    conda-forge
charset-normalizer        2.1.1              pyhd8ed1ab_0    conda-forge
click                     8.1.3            py39hf3d152e_0    conda-forge
colorama                  0.4.5              pyhd8ed1ab_0    conda-forge
contourpy                 1.0.5            py39hf939315_0    conda-forge
coverage                  6.4.4            py39hb9d737c_0    conda-forge
cryptography              37.0.1           py39h9ce1e76_0  
cudatoolkit               11.7.0              hd8887f6_10    conda-forge
cudnn                     8.4.1.50             hed8a83a_0    conda-forge
cycler                    0.11.0             pyhd8ed1ab_0    conda-forge
dbus                      1.13.18              hb2f20db_0  
expat                     2.4.9                h27087fc_0    conda-forge
flake8                    5.0.4              pyhd8ed1ab_0    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 hab24e00_0    conda-forge
fontconfig                2.14.0               hc2a2eb6_1    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
fonttools                 4.37.3           py39hb9d737c_0    conda-forge
freetype                  2.12.1               hca18f0e_0    conda-forge
frozenlist                1.3.1            py39hb9d737c_0    conda-forge
fsspec                    2022.8.2           pyhd8ed1ab_0    conda-forge
future                    0.18.2           py39hf3d152e_5    conda-forge
gettext                   0.21.0               hf68c758_0  
glib                      2.72.1               h6239696_0    conda-forge
glib-tools                2.72.1               h6239696_0    conda-forge
google-auth               2.11.0             pyh6c4a22f_0    conda-forge
google-auth-oauthlib      0.4.1                      py_2    conda-forge
googledrivedownloader     0.4                pyhd3deb0d_1    conda-forge
grpc-cpp                  1.48.1               hc2bec63_1    conda-forge
grpcio                    1.48.1           py39hfaff5cf_1    conda-forge
gst-plugins-base          1.20.2               hcf0ee16_0    conda-forge
gstreamer                 1.20.3               hd4edc92_2    conda-forge
h5py                      3.7.0           nompi_py39hd51670d_101    conda-forge
hdf5                      1.12.2          nompi_h4df4325_100    conda-forge
html5lib                  1.1                pyh9f0ad1d_0    conda-forge
icu                       69.1                 h9c3ff4c_0    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
importlib-metadata        4.11.4           py39hf3d152e_0    conda-forge
importlib_metadata        4.11.4               hd8ed1ab_0    conda-forge
iniconfig                 1.1.1              pyh9f0ad1d_0    conda-forge
intel-openmp              2022.1.0          h9e868ea_3769  
isodate                   0.6.1              pyhd8ed1ab_0    conda-forge
jinja2                    3.1.2              pyhd8ed1ab_1    conda-forge
joblib                    1.2.0              pyhd8ed1ab_0    conda-forge
jpeg                      9e                   h166bdaf_2    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.4            py39hf939315_0    conda-forge
krb5                      1.19.3               h08a2579_0    conda-forge
lark-parser               0.12.0             pyhd8ed1ab_0    conda-forge
lcms2                     2.12                 hddcbb42_0    conda-forge
ld_impl_linux-64          2.38                 h1181459_1  
lerc                      4.0.0                h27087fc_0    conda-forge
libabseil                 20220623.0      cxx17_h48a1fff_4    conda-forge
libblas                   3.9.0            16_linux64_mkl    conda-forge
libbrotlicommon           1.0.9                h166bdaf_7    conda-forge
libbrotlidec              1.0.9                h166bdaf_7    conda-forge
libbrotlienc              1.0.9                h166bdaf_7    conda-forge
libcblas                  3.9.0            16_linux64_mkl    conda-forge
libclang                  13.0.1          default_hc23dcda_0    conda-forge
libcurl                   7.83.1               h2283fc2_0    conda-forge
libdeflate                1.14                 h166bdaf_0    conda-forge
libedit                   3.1.20210910         h7f8727e_0  
libev                     4.33                 h516909a_1    conda-forge
libevent                  2.1.10               h28343ad_4    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 12.1.0              h8d9b700_16    conda-forge
libgfortran-ng            12.1.0              h69a702a_16    conda-forge
libgfortran5              12.1.0              hdcd56e2_16    conda-forge
libglib                   2.72.1               h2d90d5f_0    conda-forge
libgomp                   12.1.0              h8d9b700_16    conda-forge
libiconv                  1.16                 h516909a_0    conda-forge
liblapack                 3.9.0            16_linux64_mkl    conda-forge
libllvm13                 13.0.1               hf817b99_2    conda-forge
libnghttp2                1.47.0               hff17c54_1    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libogg                    1.3.5                h27cfd23_1  
libopus                   1.3.1                h7f98852_1    conda-forge
libpng                    1.6.38               h753d276_0    conda-forge
libpq                     14.5                 he2d8382_0    conda-forge
libprotobuf               3.20.1               h6239696_4    conda-forge
libsqlite                 3.39.3               h753d276_0    conda-forge
libssh2                   1.10.0               hf14f497_3    conda-forge
libstdcxx-ng              12.1.0              ha89aaad_16    conda-forge
libtiff                   4.4.0                h55922b4_4    conda-forge
libuuid                   2.32.1            h14c3975_1000    conda-forge
libvorbis                 1.3.7                he1b5a44_0    conda-forge
libwebp-base              1.2.4                h166bdaf_0    conda-forge
libxcb                    1.13              h7f98852_1004    conda-forge
libxkbcommon              1.0.3                he3ba5ed_0    conda-forge
libxml2                   2.9.12               h885dcf4_1    conda-forge
libzlib                   1.2.12               h166bdaf_3    conda-forge
magma                     2.5.4                h6103c52_2    conda-forge
markdown                  3.4.1              pyhd8ed1ab_0    conda-forge
markupsafe                2.1.1            py39hb9d737c_1    conda-forge
matplotlib                3.6.0            py39hf3d152e_0    conda-forge
matplotlib-base           3.6.0            py39hf9fd14e_0    conda-forge
mccabe                    0.7.0              pyhd8ed1ab_0    conda-forge
mkl                       2022.1.0           hc2b9512_224  
multidict                 6.0.2            py39hb9d737c_1    conda-forge
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
mysql-common              8.0.30               h26416b9_1    conda-forge
mysql-libs                8.0.30               hbc51c84_1    conda-forge
nccl                      2.14.3.1             h0800d71_0    conda-forge
ncurses                   6.3                  h27087fc_1    conda-forge
networkx                  2.8.6              pyhd8ed1ab_0    conda-forge
ninja                     1.11.0               h924138e_0    conda-forge
nnpops                    0.2             cuda112py39hcdac82f_5    conda-forge
nspr                      4.33                 h295c915_0  
nss                       3.78                 h2350873_0    conda-forge
numpy                     1.23.3           py39hba7629e_0    conda-forge
oauthlib                  3.2.1              pyhd8ed1ab_0    conda-forge
openjpeg                  2.5.0                h7d73246_1    conda-forge
openssl                   3.0.5                h166bdaf_2    conda-forge
packaging                 21.3               pyhd8ed1ab_0    conda-forge
pandas                    1.5.0            py39h4661b88_0    conda-forge
pcre                      8.45                 h9c3ff4c_0    conda-forge
pillow                    9.2.0            py39hd5dbb17_2    conda-forge
pip                       22.2.2             pyhd8ed1ab_0    conda-forge
pluggy                    1.0.0            py39hf3d152e_3    conda-forge
protobuf                  3.20.1           py39h5a03fae_0    conda-forge
psutil                    5.9.2            py39hb9d737c_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
py                        1.11.0             pyh6c4a22f_0    conda-forge
pyasn1                    0.4.8                      py_0    conda-forge
pyasn1-modules            0.2.8                      py_0  
pycodestyle               2.9.1              pyhd8ed1ab_0    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pydeprecate               0.3.2              pyhd8ed1ab_0    conda-forge
pyflakes                  2.5.0              pyhd8ed1ab_0    conda-forge
pyjwt                     2.5.0              pyhd8ed1ab_0    conda-forge
pyopenssl                 22.0.0             pyhd8ed1ab_1    conda-forge
pyparsing                 3.0.9              pyhd8ed1ab_0    conda-forge
pyqt                      5.12.3           py39hf3d152e_8    conda-forge
pyqt-impl                 5.12.3           py39hde8b62d_8    conda-forge
pyqt5-sip                 4.19.18          py39he80948d_8    conda-forge
pyqtchart                 5.12             py39h0fcd23e_8    conda-forge
pyqtwebengine             5.12.1           py39h0fcd23e_8    conda-forge
pysocks                   1.7.1            py39hf3d152e_5    conda-forge
pytest                    7.1.3            py39hf3d152e_0    conda-forge
pytest-cov                3.0.0              pyhd8ed1ab_0    conda-forge
python                    3.9.13          h2660328_0_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-louvain            0.15               pyhd8ed1ab_1    conda-forge
python_abi                3.9                      2_cp39    conda-forge
pytorch                   1.12.1          cuda112py39ha0cca9b_200    conda-forge
pytorch-gpu               1.12.1          cuda112py39h1894f8f_200    conda-forge
pytorch-lightning         1.6.3              pyhd8ed1ab_0    conda-forge
pytorch_cluster           1.5.9            py39hbba90f3_0    conda-forge
pytorch_geometric         2.0.3              pyhd8ed1ab_0    conda-forge
pytorch_scatter           2.0.9           cuda112py39h83a068c_0    conda-forge
pytorch_sparse            0.6.15           py39h83a068c_0    conda-forge
pytz                      2022.2.1           pyhd8ed1ab_0    conda-forge
pyu2f                     0.1.5              pyhd8ed1ab_0    conda-forge
pyyaml                    6.0              py39hb9d737c_4    conda-forge
qt                        5.12.9               h1304e3e_6    conda-forge
rdflib                    6.2.0              pyhd8ed1ab_0    conda-forge
re2                       2022.06.01           h27087fc_0    conda-forge
readline                  8.1.2                h0f457ee_0    conda-forge
requests                  2.28.1             pyhd8ed1ab_1    conda-forge
requests-oauthlib         1.3.1              pyhd8ed1ab_0    conda-forge
rsa                       4.9                pyhd8ed1ab_0    conda-forge
scikit-learn              1.1.2            py39he5e8d7e_0    conda-forge
scipy                     1.9.1            py39h8ba3f38_0    conda-forge
setuptools                59.5.0           py39hf3d152e_0    conda-forge
setuptools-scm            6.3.2              pyhd8ed1ab_0    conda-forge
setuptools_scm            6.3.2                hd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
sleef                     3.5.1                h28343ad_2    conda-forge
sqlite                    3.39.3               h4ff8645_0    conda-forge
tensorboard               2.6.0                      py_0  
tensorboard-plugin-wit    1.8.1              pyhd8ed1ab_0    conda-forge
threadpoolctl             3.1.0              pyh8a188c0_0    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
toml                      0.10.2             pyhd8ed1ab_0    conda-forge
tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
torchani                  2.2.2           cuda112py39h527ec63_6    conda-forge
torchmd-net               0.2.4                     dev_0    <develop>
torchmetrics              0.8.2              pyhd8ed1ab_0    conda-forge
tornado                   6.2              py39hb9d737c_0    conda-forge
tqdm                      4.64.1             pyhd8ed1ab_0    conda-forge
typing-extensions         4.3.0                hd8ed1ab_0    conda-forge
typing_extensions         4.3.0              pyha770c72_0    conda-forge
tzdata                    2022c                h191b570_0    conda-forge
unicodedata2              14.0.0           py39hb9d737c_1    conda-forge
urllib3                   1.26.11            pyhd8ed1ab_0    conda-forge
webencodings              0.5.1                      py_1    conda-forge
werkzeug                  2.2.2              pyhd8ed1ab_0    conda-forge
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
xorg-libxau               1.0.9                h14c3975_0    conda-forge
xorg-libxdmcp             1.1.3                h516909a_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yaml                      0.2.5                h7f98852_2    conda-forge
yarl                      1.8.1            py39h5eee18b_0  
zipp                      3.8.1              pyhd8ed1ab_0    conda-forge
zlib                      1.2.12               h166bdaf_3    conda-forge
zstd                      1.5.2                h6239696_4    conda-forge

peastman avatar Sep 22 '22 21:09 peastman

Torch-cluster is indeed the source of the problem. I replaced the conda-forge build with one from PyPI with

pip install --force torch_cluster

and the segfault went away.

peastman avatar Sep 27 '22 19:09 peastman

Do you install the same version of torch_cluser with conda and pip?

raimis avatar Sep 28 '22 09:09 raimis

They're slightly different versions. The PyPI version is 1.6.0, but the most recent version on conda-forge is 1.5.9.

peastman avatar Sep 28 '22 15:09 peastman

Closing this since torch_clusted is not a dependency anymore.

RaulPPelaez avatar Jan 17 '24 15:01 RaulPPelaez