pair_nequip icon indicating copy to clipboard operation
pair_nequip copied to clipboard

Exception: expected scalar type Double but found Float

Open duneandre opened this issue 1 year ago • 4 comments

Hi,

I have trained a Machine Learning Potential using NequIP version 0.5.6.

Now I want to run MD with this MLP using LAMMPS (version 29 August 2024) and pair_nequip (latest git version). I am running it on another machine on which I have installed NequIP version 0.6.1. (the latest). The calculation did not work and I received the following error : Exception: expected scalar type Double but found Float

Do you know how to solve this problem?

Thanks for your help !

duneandre avatar Oct 17 '24 11:10 duneandre

I am encountering this issue as well:

  • a model trained and deployed with NequIP version 0.5.6
  • trying to run on LAMMPS with latest pair_nequip from git main branch

I have found and read the previous issue at https://github.com/mir-group/pair_nequip/issues/51 which seems similar. It states that the problem should be fixed after pull request https://github.com/mir-group/pair_nequip/pull/52 from @anjohan but I am still seeing the issue with latest git version (which includes this commit).

Is there any configuration that can be tweaked to pair_nequip? Or any advice on how to redeploy the model?

fxcoudert avatar Oct 17 '24 11:10 fxcoudert

Hello, I was wondering if you were able to solve this issue? I am also facing the "Exception: expected scalar type Double but found Float" error, with the following versions: (default_dtype: float64, model_dtype: float32 in config.yaml) nequip 0.6.0 pair_nequip 0.5.1 lammps stable_29Sep2021_update2

I tried to solve it by converting the pth file into float32 based on #21 (However, using torch.jit as I got a serialization error without it), but I got the following error: The indicated TorchScript file does not appear to be a deployed NequIP model; did you forget to run nequip-deploy? (src/pair_nequip.cpp:181)

baham2 avatar Feb 03 '25 16:02 baham2

Hi all,

I'm experiencing a similar issue. The Lammps is compiled successfully in my personal PC and cluster, but when I try to use Nequip in LAMMPS, it fails.

Environment Details: nequip 0.5.6 pair_nequip 0.5.2 Torch 1.12.1+cu11.6 lammps-stable_23Jun2022_update4

I've tried generating different potential.pth files by modifying the YAML configuration and retraining:

  1. default_dtype: float32
  2. default_dtype: float64
  3. default setting like neq 0.6.0 (default_dtype: float64 model_dtype: float32 allow_tf32: true)

Each test results in a core dump, with the following errors:

  1. terminate called after throwing an instance of 'c10::Error' what(): expected scalar type Double but found Float
  2. RuntimeError: expected scalar type Float but found Double
  3. terminate called after throwing an instance of 'c10::Error' what(): expected scalar type Double but found Float

My Lammps input is

units metal dimension 3 newton off boundary p p p box tilt large atom_style charge neighbor 2.0 bin read_data structure_atom.dat.q pair_style nequip pair_coeff * * ./potential.pth Sr Ti O run 0

And the structure_atom.dat.q is just supercell of SrTiO3 containing 60 atoms

I'm quite confused about what the issue might be. Any help would be greatly appreciated! Thanks in advance.

afour9961207 avatar Feb 27 '25 16:02 afour9961207

Hi all---this is indeed strange that you are still seeing this issue with what should have been a fix. I'm not sure why... we should have some more patches out within some weeks that should resolve this; thanks for your patience.

Linux-cpp-lisp avatar Feb 28 '25 05:02 Linux-cpp-lisp