openmm-torch
openmm-torch copied to clipboard
Segmentation fault upon creating the Context when adding both RMSD biased force and Torch Force
I've been using OpenMM 7.7.0 and OpenMM-Torch 0.8 successfully to run a PyTorch model, however, when I add an RMSD biasing force to the system as well as the TorchForce, I get a segmentation fault upon creating the Context. This RMSD biasing force has also worked independently without issue. My system setup is as follows:
# Import openmm libraries
from openmm.app import *
from openmm import *
from openmm.unit import *
from sys import stdout
# Import OpenMM-Torch
from openmmtorch import TorchForce
# Import torch_cluster (from PyTorch-Geometric)
from torch_cluster import radius_graph
# Import struct / force fields
pdb = PDBFile('struct.pdb')
ff = ForceField('amber14-all.xml')
# Build system
system = ff.createSystem(pdb.topology, nonbondedMethod=NoCutoff, constraints=HBonds)
# Initialize the TorceForce
ml_model = TorchForce('model.pt')
scaler = 1
# Create TorchForce as a CustomCVForce
U_ml = CustomCVForce('scaler*ml_model')
# Add parameters to the CustomCVForce
U_ml.addCollectiveVariable('ml_model', ml_model)
U_ml.addGlobalParameter('scaler', scaler)
# Add force to the system
system.addForce(U_ml)
# Loading reference positions for RMSD force
ref_coords = pdb.positions
# Get atom indices of backbone heavy atoms for RMSD calculation
atom_idx = []
idx = 0
for atom in pdb.topology.atoms():
if atom.name == 'CA':
atom_idx.append(idx)
if atom.name == 'C':
atom_idx.append(idx)
if atom.name == 'N':
atom_idx.append(idx)
if atom.name == 'O':
atom_idx.append(idx)
idx = idx + 1
# Set RMSD calculation / initialize k_rmsd / rmsd_0
rmsd = RMSDForce(ref_coords, atom_idx)
k_rmsd = 1000 # (kJ / mol / nm^2)
rmsd_0 = 0.2 # (nm)
# Create harmonic RMSD-biasing force as CustomCVForce
U_rmsd = CustomCVForce('0.5*k_rmsd*(rmsd - rmsd_0)^2')
# Add parameters to the CustomCVForce
U_rmsd.addCollectiveVariable('rmsd', rmsd)
U_rmsd.addGlobalParameter('k_rmsd', k_rmsd)
U_rmsd.addGlobalParameter('rmsd_0', rmsd_0)
# Add force to the system
system.addForce(U_rmsd)
# Create the integrator / platform
integrator = LangevinMiddleIntegrator(340*kelvin, 1/picosecond, 0.0025*picoseconds)
platform = Platform.getPlatformByName('Reference')
# Build simulation
sim = Simulation(pdb.topology, system, integrator, platform)
As stated above, building the Context with Simulation results in a segmentation fault. I've tried implementing this in various other ways that have led to the same result. The following lists other ways of implementing these forces that I've tried:
- Using OpenMM 8.0 Beta and OpenMM-Torch 1.0 Beta
- Adding the TorchForce directly without using CustomCVForce
system.addForce(ml_model)
- Adding the TorchForce and RMSD force as collective variables of a single CustomCVForce
U_rmsd_ml = CustomCVForce('scaler*ml_model + 0.5*k_rmsd*(rmsd - rmsd_0)^2')
- Effectively turning off the TorchForce by setting
scaler = 0
- Building the Context without using Simulation
context = Context(system, integrator, platform)
- Using the CPU platform
- Switching the order in which I add the forces
All of this results in the same segmentation fault when the Context is built. Again, the model will run without issue when added independently to the system, as will the RMSD-biasing force. Any help with this issue would be greatly appreciated!
The files struct.pdb
and model.pt
can be found in the following zipped folder: struct_model.zip
Could you share struct.pdb
and a script to generate model.pt
. So, it is possible to reproduce the issue.
Also, could you add the imports to the script? So it is possible to run it.
I've edited my original post to include the imports and the files struct.pdb
and model.pt
.
Your script runs fine for me using the latest code for OpenMM and for this plugin. I notice your model uses the torch_cluster
package. How did you install it? Possibly it was compiled in a way that's incompatible with this plugin. Can you post the output of conda list
?
Try running your script inside gdb
. Let it run until it hits the segfault, then type bt
to get a stack trace for where it happened and post it here.
I installed torch_cluster
into a clean conda environment with OpenMM 8.0 beta and OpenMM-Torch 1.0 beta as follows:
conda create -n torch_omm8b openmm openmm-torch -c "conda-forge/label/openmm_rc" -c "conda-forge/label/openmm-torch_rc"
conda install scipy
conda install mdtraj -c conda-forge
pip install torch-cluster -f https://data.pyg.org/whl/torch-1.11.0+cu112.html
The following text file contains the output from conda list
:
conda_list_omm8b_env.txt
and the following text file contains the backtrace from running my script in gdb
:
gdb_bt_omm8b_env.txt
That build is likely incompatible with packages from conda-forge. Try installing it like this instead.
conda install -c conda-forge pytorch_cluster
I have created the environment:
conda env create mmh/openmm-8-beta-linux
conda activate openmm-8-beta-linux
conda install -c conda-forge pytorch_cluster
The scirt works with problem.
@JustinAiras try to create a new environment as indicated with the latest (22.9.0) conda
.
I've run the exact set of commands you've provided using conda
22.9.0, but after from torch_cluster import radius_graph
I get the following error message:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/airasj/anaconda3/envs/openmm-8-beta-linux/lib/python3.10/site-packages/torch_cluster/__init__.py", line 18, in <module>
torch.ops.load_library(spec.origin)
File "/home/airasj/anaconda3/envs/openmm-8-beta-linux/lib/python3.10/site-packages/torch/_ops.py", line 220, in load_library
ctypes.CDLL(path)
File "/home/airasj/anaconda3/envs/openmm-8-beta-linux/lib/python3.10/ctypes/__init__.py", line 374, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /home/airasj/anaconda3/envs/openmm-8-beta-linux/lib/python3.10/site-packages/torch_cluster/_grid_cuda.so: undefined symbol: _ZN3c106detail19maybe_wrap_dim_slowEllb
@JustinAiras this might be a conda
issue (https://github.com/openmm/openmm-torch/issues/88#issuecomment-1310477870). Could you try to install with mamba
?
Thank you, installing with mamba solved my most immediate issue, and I now can run MD with a TorchForce and RMSD-biasing force without encountering a segmentation fault.
I installed mamba into the base environment of a clean miniconda install, and created a new environment as follows:
mamba create -n torch_omm8b openmm openmm-torch pytorch_cluster -c "conda-forge/label/openmm_rc" -c "conda-forge/label/openmm-torch_rc" -c conda-forge
Note that this also worked with a mambaforge installation, but differences in cluster permissions required me to use miniconda. Also note that pytorch_cluster
needs to be installed at the same time as openmm-torch
as I get the following error if doing otherwise:
- nothing provides __cuda needed by pytorch-1.12.1-cuda102py310ha664643_201
For my purposes (I only need to use the CPU platform), installing with the above command resolves my issue. However, I still get issues if I try to use the CUDA platform. Upon building the simulation, I get the following error:
File "/home/gridsan/jairas/work/small_prot_MD/chignolin/MD/torch_md/best_model/umbrella/rmsd_bias/GPU/torch_umb.py", line 79, in <module>
sim = Simulation(pdb.topology, system, integrator, platform)
File "/home/gridsan/jairas/miniconda3/envs/torch_omm8b/lib/python3.9/site-packages/openmm/app/simulation.py", line 101, in __init__
self.context = mm.Context(self.system, self.integrator, platform)
File "/home/gridsan/jairas/miniconda3/envs/torch_omm8b/lib/python3.9/site-packages/openmm/openmm.py", line 3530, in __init__
_openmm.Context_swiginit(self, _openmm.new_Context(*args))
openmm.OpenMMException: Error loading CUDA module: CUDA_ERROR_UNSUPPORTED_PTX_VERSION (222)
Given similarities to how CUDA is installed on the cluster I use and those discussed in issue https://github.com/openmm/openmm-torch/issues/88#issuecomment-1310625318, I suspect the solution to this problem might lie somewhere there.
This sounds like an issue with the CUDA toolkit version, see this issue from OpenMM: 3585
You will need to find out what drivers and CUDA version are installed on the cluster you are using, probably by running nvidia-smi
on a compute node.
And then tell conda to install a compatible cudatoolkit.
e.g. mamba install -c conda-forge openmm cudatoolkit=10.X