NNPOps
NNPOps copied to clipboard
Can't run steps of dynamics with NNPOps `TorchForce`
In attempting to run MD on a TorchForce
-equipped System
(the TorchForce
has the NNPOps
symmetry functions equipped as described here ), I am observing strange behavior. Namely, I am able to create a Context
with the System
and return the State
object with a potential energy, but when i run a step of dynamics, I observe
Traceback (most recent call last):
File "/lila/home/rufad/github/qmlify/qmlify/openmm_torch/notebooks/yield_dynamics.py", line 119, in <module>
ml_int.step(1)
File "/home/rufad/anaconda3/envs/nnpops/lib/python3.9/site-packages/simtk/openmm/openmm.py", line 7036, in step
return _openmm.CustomIntegrator_step(self, steps)
simtk.openmm.OpenMMException: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/__torch__/torchani/nn.py", line 95, in forward
if torch.gt((torch.size(midx))[0], 0):
input_ = torch.index_select(aev0, 0, midx)
_29 = torch.flatten((_22).forward(input_, ), 0, -1)
~~~~~~~~~~~~ <--- HERE
_30 = torch.masked_scatter_(output, mask, _29)
else:
File "code/__torch__/torch/nn/modules/container.py", line 22, in forward
_5 = getattr(self, "5")
_6 = getattr(self, "6")
input0 = (_0).forward(input, )
~~~~~~~~~~~ <--- HERE
input1 = (_1).forward(input0, )
input2 = (_2).forward(input1, )
File "code/__torch__/torch/nn/modules/linear.py", line 13, in forward
input: Tensor) -> Tensor:
_0 = __torch__.torch.nn.functional.linear
return _0(input, self.weight, self.bias, )
~~ <--- HERE
File "code/__torch__/torch/nn/functional.py", line 4, in linear
weight: Tensor,
bias: Optional[Tensor]=None) -> Tensor:
return torch.linear(input, weight, bias)
~~~~~~~~~~~~ <--- HERE
def celu(input: Tensor,
alpha: float=1.,
Traceback of TorchScript, original code (most recent call last):
File "/home/rufad/anaconda3/envs/nnpops/lib/python3.9/site-packages/torchani/nn.py", line 68, in forward
if midx.shape[0] > 0:
input_ = aev.index_select(0, midx)
output.masked_scatter_(mask, m(input_).flatten())
~ <--- HERE
output = output.view_as(species)
return SpeciesEnergies(species, torch.sum(output, dim=1))
File "/home/rufad/anaconda3/envs/nnpops/lib/python3.9/site-packages/torch/nn/modules/container.py", line 119, in forward
def forward(self, input):
for module in self:
input = module(input)
~~~~~~ <--- HERE
return input
File "/home/rufad/anaconda3/envs/nnpops/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 94, in forward
def forward(self, input: Tensor) -> Tensor:
return F.linear(input, self.weight, self.bias)
~~~~~~~~ <--- HERE
File "/home/rufad/anaconda3/envs/nnpops/lib/python3.9/site-packages/torch/nn/functional.py", line 1753, in linear
if has_torch_function_variadic(input, weight):
return handle_torch_function(linear, (input, weight), input, weight, bias=bias)
return torch._C._nn.linear(input, weight, bias)
~~~~~~~~~~~~~~~~~~~ <--- HERE
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
On the other hand, if i do not equip the NNPops
ani symmetry functions, this error is not encountered. I didnt notice any examples/pytests in this repo re: equipping a TorchForce
with ANISymmetryFunctions
. I'm not sure if this interoperability has been tested yet. If so, would it be possible to add a pytest/example? I'm not sure if this should go into the openmm-torch
repo instead (since the functionality I was to practice uses NNPOPS
). I'd be happy to troubleshoot if needed.
This seems to be a common error. This issue has lots of discussion by people encountering it.
https://github.com/NVIDIA/apex/issues/580
Here's one where the problem was fixed by upgrading to PyTorch 1.9.
https://github.com/allenai/allennlp/issues/5064
In this one it was fixed by upgrading to CUDA 11.2.
https://stackoverflow.com/questions/66600362/runtimeerror-cuda-error-cublas-status-execution-failed-when-calling-cublassge
There are many other pages discussing the same error. Often it seems related to inconsistencies in the shapes or dtypes of tensors.
I noticed these, too. Will give these solutions a try and get back. Thanks for the sleuthing.
@dominicrufa : Is this still an active issue?