NNPOps
NNPOps copied to clipboard
Torch problems when using NNPOps with Openmm-ML
Thanks for the great ecosystem for ML potentials in MD!
I tried running this simple `openmm-ml` example that uses `createSystem`:
#!/usr/bin/env python3
from openmm.app import *
from openmm import *
from openmm.unit import *
from openmmml import MLPotential
from sys import argv,stdout
# must be either "nnpops", "torchani"
implementation = argv[1]
input_file = argv[2]
pdb = PDBFile(input_file)
print("Creating ANI potential")
potential = MLPotential('ani2x')
print("Creating system")
system = potential.createSystem(pdb.topology, implementation=implementation)
print("Creating simulation")
integrator = LangevinMiddleIntegrator(300*kelvin, 1/picosecond, 0.004*picoseconds)
simulation = Simulation(pdb.topology, system, integrator)
simulation.context.setPositions(pdb.positions)
print("Minimizing energy")
simulation.minimizeEnergy()
print("Simulating")
simulation.reporters.append(StateDataReporter(stdout, 1000, step=True,
potentialEnergy=True, temperature=True))
simulation.step(10000)
print("done")
I'm using a simple methane PDB file:
HETATM 1 C1 UNK 0 -0.238 0.373 0.000 1.00 0.00 C
HETATM 2 H1 UNK 0 -0.238 1.486 0.000 1.00 0.00 H
HETATM 3 H2 UNK 0 -1.286 0.002 -0.057 1.00 0.00 H
HETATM 4 H3 UNK 0 0.335 0.002 -0.879 1.00 0.00 H
HETATM 5 H4 UNK 0 0.236 0.002 0.936 1.00 0.00 H
END
When I specify to use the torchani
implementation, everything goes through OK.
However, when I try to use nnpops
, I get the following stacktrace (when running the energy minimization):
Traceback (most recent call last):
File "/scratch/openmm-nnp/./run_md.py", line 28, in <module>
simulation.minimizeEnergy()
File "/scratch/.conda/envs/openmm_nnp/lib/python3.10/site-packages/openmm/app/simulation.py", line 137, in minimizeEnergy
mm.LocalEnergyMinimizer.minimize(self.context, tolerance, maxIterations)
File "/scratch/.conda/envs/openmm_nnp/lib/python3.10/site-packages/openmm/openmm.py", line 8544, in minimize
return _openmm.LocalEnergyMinimizer_minimize(context, tolerance, maxIterations)
openmm.OpenMMException: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
File "<string>", line 57, in <backward op>
self_scalar_type = self.dtype
def backward(grad_output):
grad_self = AD_sum_backward(grad_output, self_size, dim, keepdim).to(self_scalar_type) / AD_safe_size(self_size, dim)
~~~~~~~~~~~~~~~ <--- HERE
return grad_self, None, None, None
File "<string>", line 24, in AD_sum_backward
if not keepdim and len(sizes) > 0:
if len(dims) == 1:
return grad.unsqueeze(dims[0]).expand(sizes)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
else:
res = AD_unsqueeze_multiple(grad, dims, len(sizes))
RuntimeError: expand(CUDADoubleType{[1, 1]}, size=[1]): the number of sizes provided (1) must be greater or equal to the number of dimensions in the tensor (2)
I'm using the mmh/openmm-8-beta-linux
environment (via the command mamba env create mmh/openmm-8-beta-linux
) on a Debian Bullseye system with an NVIDIA T4.
My full environment dump (`conda env export`):
channels:
- conda-forge/label/openmm-torch_rc
- conda-forge/label/openmm_rc
- conda-forge
- defaults
dependencies:
- _libgcc_mutex=0.1=conda_forge
- _openmp_mutex=4.5=2_kmp_llvm
- attrs=22.2.0=pyh71513ae_0
- brotlipy=0.7.0=py310h5764c6d_1005
- bzip2=1.0.8=h7f98852_4
- c-ares=1.18.1=h7f98852_0
- ca-certificates=2022.12.7=ha878542_0
- cached-property=1.5.2=hd8ed1ab_1
- cached_property=1.5.2=pyha770c72_1
- certifi=2022.12.7=pyhd8ed1ab_0
- cffi=1.15.1=py310h255011f_3
- charset-normalizer=2.1.1=pyhd8ed1ab_0
- colorama=0.4.6=pyhd8ed1ab_0
- cryptography=39.0.0=py310h34c0648_0
- cudatoolkit=11.8.0=h37601d7_11
- cudnn=8.4.1.50=hed8a83a_0
- exceptiongroup=1.1.0=pyhd8ed1ab_0
- h5py=3.7.0=nompi_py310h416281c_102
- hdf5=1.12.2=nompi_h4df4325_101
- icu=70.1=h27087fc_0
- idna=3.4=pyhd8ed1ab_0
- importlib-metadata=6.0.0=pyha770c72_0
- importlib_metadata=6.0.0=hd8ed1ab_0
- iniconfig=2.0.0=pyhd8ed1ab_0
- keyutils=1.6.1=h166bdaf_0
- krb5=1.20.1=h81ceb04_0
- lark-parser=0.12.0=pyhd8ed1ab_0
- ld_impl_linux-64=2.39=hcc3a1bd_1
- libaec=1.0.6=h9c3ff4c_0
- libblas=3.9.0=16_linux64_openblas
- libcblas=3.9.0=16_linux64_openblas
- libcurl=7.87.0=hdc1c0ab_0
- libedit=3.1.20191231=he28a2e2_2
- libev=4.33=h516909a_1
- libffi=3.4.2=h7f98852_5
- libgcc-ng=12.2.0=h65d4601_19
- libgfortran-ng=12.2.0=h69a702a_19
- libgfortran5=12.2.0=h337968e_19
- libhwloc=2.8.0=h32351e8_1
- libiconv=1.17=h166bdaf_0
- liblapack=3.9.0=16_linux64_openblas
- libnghttp2=1.51.0=hff17c54_0
- libnsl=2.0.0=h7f98852_0
- libopenblas=0.3.21=pthreads_h78a6416_3
- libprotobuf=3.21.12=h3eb15da_0
- libsqlite=3.40.0=h753d276_0
- libssh2=1.10.0=hf14f497_3
- libstdcxx-ng=12.2.0=h46fd767_19
- libuuid=2.32.1=h7f98852_1000
- libxml2=2.10.3=hca2bb57_1
- libzlib=1.2.13=h166bdaf_4
- llvm-openmp=15.0.6=he0ac6c6_0
- magma=2.5.4=hc72dce7_4
- mkl=2022.2.1=h84fe81f_16997
- nccl=2.14.3.1=h0800d71_0
- ncurses=6.3=h27087fc_1
- ninja=1.11.0=h924138e_0
- nnpops=0.2=cuda112py310h8b99da5_5
- numpy=1.24.1=py310h08bbf29_0
- ocl-icd=2.3.1=h7f98852_0
- ocl-icd-system=1.0.0=1
- openmm=8.0.0beta=py310h2996cf7_2
- openmm-ml=1.0beta=pyh79ba5db_2
- openmm-torch=1.0beta=cuda112py310h02d4f52_2
- openssl=3.0.7=h0b41bf4_1
- packaging=22.0=pyhd8ed1ab_0
- pip=22.3.1=pyhd8ed1ab_0
- pluggy=1.0.0=pyhd8ed1ab_5
- pycparser=2.21=pyhd8ed1ab_0
- pyopenssl=23.0.0=pyhd8ed1ab_0
- pysocks=1.7.1=pyha2e5f31_6
- pytest=7.2.0=pyhd8ed1ab_2
- python=3.10.8=h4a9ceb5_0_cpython
- python_abi=3.10=3_cp310
- pytorch=1.12.1=cuda112py310he33e0d6_201
- readline=8.1.2=h0f457ee_0
- requests=2.28.1=pyhd8ed1ab_1
- setuptools=59.5.0=py310hff52083_0
- setuptools-scm=6.3.2=pyhd8ed1ab_0
- setuptools_scm=6.3.2=hd8ed1ab_0
- sleef=3.5.1=h9b69904_2
- tbb=2021.7.0=h924138e_1
- tk=8.6.12=h27826a3_0
- tomli=2.0.1=pyhd8ed1ab_0
- torchani=2.2.2=cuda112py310h98dee98_6
- typing_extensions=4.4.0=pyha770c72_0
- tzdata=2022g=h191b570_0
- urllib3=1.26.13=pyhd8ed1ab_0
- wheel=0.38.4=pyhd8ed1ab_0
- xz=5.2.6=h166bdaf_0
- zipp=3.11.0=pyhd8ed1ab_0
I've seen some mention of similar problems, but haven't been able to find the solution.
Any help is greatly appreciated. Apologies if this isn't the correct repo to open this issue in.
Thanks!
This was fixed in https://github.com/openmm/NNPOps/pull/71, but we haven't released the new version.
That means we need to build a new release today, since the fix is required for the OpenMM 8 release candidate. Can you build it?
Ah, that's great to hear! Apologies, I thought that fix was in the current version
I'll eagerly await the new release
Thanks!
@zanebeckwith NNPOps 0.3 has been released. Could you confirm, if it fixes the issue?
@raimis yes, this fixed it!
And it is significantly faster (at least 10x) than the torchani version, that's awesome!
Thank you!
It's still 10-50x slower than OpenFF for some simple small molecules (methane, alanine). This is a limited test, so I don't put too much stock in the numbers.
But, to set my expectations, what is the expected performance compared to a traditional force field, and what do I need to be aware of to get that maximum performance?
(I can ask that question elsewhere if there's a better forum. Thank you for your help!)