🐛 [BUG] Error during training with training set of different cell size
Environment I used is
- OS : CentOS
- python version : 3.8
- nequip version : 0.5.4
- e3nn version : 0.4.4
- pytorch version : 1.10.1
- cuda version : 11.2
During the training, I tried to use the train set of multiple cell size. (for example some training set of 120 atoms and some training set of 60 atoms) Then the training ended with the errors below.
instantiate NpzDataset
optional_args : key_mapping
optional_args : npz_fixed_field_keys
optional_args : root
optional_args : extra_fixed_fields <- dataset_extra_fixed_fields
optional_args : file_name <- dataset_file_name
...NpzDataset_param = dict(
... optional_args = {'key_mapping': {'z': 'atomic_numbers', 'E': 'total_energy', 'F': 'forces', 'R': 'pos'}, 'include_keys': [], 'npz_fixed_field_keys': ['atomic_numbers'], 'file_name': './train_set.npz', 'url': None, 'force_fixed_keys': [], 'extra_fixed_fields': {'r_max': 4.0}, 'include_frames': None, 'root': 'results/GeSe2'},
... positional_args = {'type_mapper': <nequip.data.transforms.TypeMapper object at 0x2b9f505d7490>})
Traceback (most recent call last):
File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/utils/auto_init.py", line 232, in instantiate
instance = builder(**positional_args, **final_optional_args)
File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/data/dataset.py", line 681, in __init__
super().__init__(
File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/data/dataset.py", line 123, in __init__
super().__init__(root=root, transform=type_mapper)
File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/utils/torch_geometric/dataset.py", line 90, in __init__
self._process()
File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/utils/torch_geometric/dataset.py", line 175, in _process
self.process()
File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/data/dataset.py", line 269, in process
data_list = [
File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/data/dataset.py", line 270, in <listcomp>
constructor(**{**{f: v[i] for f, v in fields.items()}, **fixed_fields})
File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/data/AtomicData.py", line 326, in from_points
return cls(edge_index=edge_index, pos=torch.as_tensor(pos), **kwargs)
File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/data/AtomicData.py", line 221, in __init__
_process_dict(kwargs)
File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/data/AtomicData.py", line 163, in _process_dict
raise ValueError(
ValueError: atomic_numbers is a node field but has the wrong dimension torch.Size([72, 1])
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/gshs12051/anaconda3/envs/pytorch/bin/nequip-train", line 8, in <module>
sys.exit(main())
File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/scripts/train.py", line 74, in main
trainer = fresh_start(config)
File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/scripts/train.py", line 177, in fresh_start
dataset = dataset_from_config(config, prefix="dataset")
File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/data/_build.py", line 78, in dataset_from_config
instance, _ = instantiate(
File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/utils/auto_init.py", line 234, in instantiate
raise RuntimeError(
RuntimeError: Failed to build object with prefix `dataset` using builder `NpzDataset`
NpzDataset, and indeed the npz format from numpy itself, requires all arrays to be rectangular (i.e. all frames must have the same number of atoms). For variable number of atoms, and indeed most cases, we recommend using dataset: ase with the extxyz format.
This particular error indicates that the number of atomic_numbers you provide is inconsistent with the number of atoms, which is taken from the number of positions you provide.
See other discussions, for example:
- https://github.com/mir-group/nequip/discussions/228
- https://github.com/mir-group/nequip/discussions/137
- https://github.com/mir-group/nequip/discussions/200