issue when using nequip-deploy 🐛 [BUG]
Describe the bug issue when using nequip-deploy
To Reproduce nequip-deploy build --train-dir model_path/ model_path/deployed_model.pth
ERROR:
[W init.cpp:833] Warning: Use _jit_set_fusion_strategy, bailout depth is deprecated. Setting to (STATIC, 2) (function operator())
/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_check.py:172: UserWarning: The TorchScript type system doesn't support instance-level annotations on empty non-base types in `__init__`. Instead, either 1) use a type annotation in the class body, or 2) wrap the type in `torch.jit.Attribute`.
warnings.warn("The TorchScript type system doesn't support "
Traceback (most recent call last):
File "/home/anaconda3/envs/bebam/bin/nequip-deploy", line 8, in <module>
sys.exit(main())
File "/home/nequip/nequip/nequip/scripts/deploy.py", line 225, in main
model = _compile_for_deploy(model)
File "/home/nequip/nequip/nequip/scripts/deploy.py", line 62, in _compile_for_deploy
model = script(model)
File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/e3nn/util/jit.py", line 266, in script
out = compile(mod, in_place=in_place)
File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/e3nn/util/jit.py", line 101, in compile
compile(
File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/e3nn/util/jit.py", line 113, in compile
mod = torch.jit.script(mod, **script_options)
File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_script.py", line 1284, in script
return torch.jit._recursive.create_script_module(
File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_recursive.py", line 480, in create_script_module
return create_script_module_impl(nn_module, concrete_type, stubs_fn)
File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_recursive.py", line 542, in create_script_module_impl
script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn)
File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_script.py", line 614, in _construct
init_fn(script_module)
File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_recursive.py", line 520, in init_fn
scripted = create_script_module_impl(orig_value, sub_concrete_type, stubs_fn)
File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_recursive.py", line 546, in create_script_module_impl
create_methods_and_properties_from_stubs(concrete_type, method_stubs, property_stubs)
File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_recursive.py", line 397, in create_methods_and_properties_from_stubs
concrete_type._create_methods_and_properties(property_defs, property_rcbs, method_defs, method_rcbs, method_defaults)
File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_recursive.py", line 867, in try_compile_fn
return torch.jit.script(fn, _rcb=rcb)
File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_script.py", line 1338, in script
ast = get_jit_def(obj, obj.__name__)
File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/frontend.py", line 297, in get_jit_def
return build_def(parsed_def.ctx, fn_def, type_line, def_name, self_name=self_name, pdt_arg_types=pdt_arg_types)
File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/frontend.py", line 335, in build_def
param_list = build_param_list(ctx, py_def.args, self_name, pdt_arg_types)
File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/frontend.py", line 359, in build_param_list
raise NotSupportedError(ctx_range, _vararg_kwarg_err)
torch.jit.frontend.NotSupportedError: Compiled functions can't take variable number of arguments or use keyword-only arguments with defaults:
File "/home/anaconda3/envs/bebam/lib/python3.10/logging/__init__.py", line 2131
def debug(msg, *args, **kwargs):
~~~~~~~ <--- HERE
"""
Log a message with severity 'DEBUG' on the root logger. If the logger has
This looks like you've edited the code to include logging.debug calls in the model?
Not that, but I have set up my python environment for nequip such that I am able to use pytroch 2.0 (unlike what is prescribed : PyTorch >= 1.8, !=1.9, <=1.11.*. PyTorch, due to hardware constraints and try few other things with torch geometric). I am able to train the nequip models in this setup but when trying to deploy the model getting this error. My goal is to do a md simulation on trained model and I thought that I could use NequIPCalculator.from_deployed_model(model, **kwargs). Is there a workaround such that I can do the md sim without having to deploy the model?
I see--- what hardware things? Please note the following upstream issue: https://github.com/mir-group/nequip/discussions/311. If you do or do not encounter this issue, please post in that thread so we can continue to try to resolve and understand this problem. Also please note that on AMD GPUs more recent versions of PyTorch appear to be fine.
Regarding torch_geometric, that is no longer a dependency of nequip, but maybe I am misinterpreting what you mean.
You could try 1.13? I've never seen this issue reported before... besides your PyTorch version, is there anything else custom or unusual about your setup? There should never be a call to logging.debug in the model. Maybe the rest of the stack trace, which isn't included here, says where in the model it is?
Thank you, will try and get back with more details.
Can you please answer this: "My goal is to do a md simulation on trained model and I thought that I could use NequIPCalculator.from_deployed_model(model, **kwargs). Is there a workaround such that I can do the md sim without having to deploy the model?"
Actually I have already trained quite a number of models and since nequip-deploy is not working for them I am looking for some work around to complete my study without having to setup things again.
You could do inefficient MD by manually constructing NequIPCalculator from an uncompiled PyTorch model (build using model_from_config and .load_state_dict and then passed to the constructor, rather than from_deployed_model). This will loose you performance in a lot of places, however.
It is not possible to do MD in LAMMPS, OpenMM, etc. without deploying.
Thank you, will try and get back with more details.
Thanks. It's possible that there is a missing @torch.jit.unused, in which case a quick code change will make it possible for you to deploy everything without retraining. (In general most code and version changes will not require retraining.)
You could do inefficient MD by manually constructing NequIPCalculator from an uncompiled PyTorch model (build using model_from_config and .load_state_dict and then passed to the constructor, rather than from_deployed_model).
Do I need to modify the calculate function in "class NequIPCalculator(Calculator)" if I use uncompiled PyTorch model?
No, you shouldn't need to.
No, you shouldn't need to.
cool