ommprotocol
ommprotocol copied to clipboard
CUDA error (700)
Hi!
I'm experiencing troubles when running MD simulations on metalloprotein (LmrR, 139k atoms in system) with OMMprotocol. So I keep randomly getting this CUDA error (700) and due to this sometimes I have to restart even multiple times to get the simulation finished. In other cases I'm lucky and it finishes without problems. But it's quite annoying, do you happen to know any solutions for this error?
Thanks, BBrouwer
An error occurred: Error invoking kernel: CUDA error (700)
Saving state...
FAILED :(
Traceback (most recent call last):
File "/QFsoft/applic/python/conda/envs/openmm-7.1/bin/ommprotocol", line 11, in
This is more related to openmm than to ommprotocol itself. We have seen this error several times for several reasons, but it's normally due to driver/cuda runtime incompatibilities, faulty GPUs, etc.
See:
- https://github.com/pandegroup/openmm/issues/1728
- https://github.com/pandegroup/openmm/issues/820
I think I included a patch to enable disablePmeStream by default, but if not, you can manually specify it in this part of your input:
platform_properties:
Precision: mixed
DisablePmeStream: 'true'
(quotes are needed because OpenMM expects a str not a bool)
Thank you!
To be sure, I will include the DisablePmeStream: 'true' and see if this helps.
Hi,
Unfortunately, I am reproducing this old error while running MD simulations using OpenMM on Summit. Minimization finishes successfully but simulation does not start. I installed OpenMM using the instructions at https://github.com/inspiremd/conda-recipes-summit. My error message looks like this:
min step 0: 10.0
min end: 0.0
eq1 start: 1.0
Traceback (most recent call last):
File "../sim.py", line 75, in <module>
simulation.step(1500)
File "/ccs/home/apbhati/miniconda/envs/openmm/lib/python3.7/site-packages/simtk/openmm/app/simulation.py", line 132, in step
self._simulate(endStep=self.currentStep+steps)
File "/ccs/home/apbhati/miniconda/envs/openmm/lib/python3.7/site-packages/simtk/openmm/app/simulation.py", line 197, in _simulate
self.integrator.step(10) # Only take 10 steps at a time, to give Python more chances to respond to a control-c.
File "/ccs/home/apbhati/miniconda/envs/openmm/lib/python3.7/site-packages/simtk/openmm/openmm.py", line 9475, in step
return _openmm.LangevinIntegrator_step(self, steps)
Exception: Error invoking kernel: CUDA_ERROR_ILLEGAL_ADDRESS (700)
terminate called after throwing an instance of 'OpenMM::OpenMMException'
what(): Error deleting array bondParams: CUDA_ERROR_ILLEGAL_ADDRESS (700)
Aborted (core dumped)
This is the first time I am getting such an error while trying to add harmonic restraints using CustomExternalForce. Interestingly, I do not get this error when I comment out all the lines related to harmonic restraints using the exact same input files. I am not sure what is causing it and how to get rid of it. I tried including DisablePmeStream: 'true' in my input script, but that does not help. Can anyone please help me with this?
I have attached my script. Thank you.
Best, Agastya openmm.zip