🐛 [BUG] Error during training with training set of different cell size

Open gshs12051 opened this issue 3 years ago • 1 comments

Environment I used is

OS : CentOS
python version : 3.8
nequip version : 0.5.4
e3nn version : 0.4.4
pytorch version : 1.10.1
cuda version : 11.2

During the training, I tried to use the train set of multiple cell size. (for example some training set of 120 atoms and some training set of 60 atoms) Then the training ended with the errors below.

instantiate NpzDataset
   optional_args :                                         key_mapping
   optional_args :                                npz_fixed_field_keys
   optional_args :                                                root
   optional_args :                                  extra_fixed_fields <-                         dataset_extra_fixed_fields
   optional_args :                                           file_name <-                                  dataset_file_name
...NpzDataset_param = dict(
...   optional_args = {'key_mapping': {'z': 'atomic_numbers', 'E': 'total_energy', 'F': 'forces', 'R': 'pos'}, 'include_keys': [], 'npz_fixed_field_keys': ['atomic_numbers'], 'file_name': './train_set.npz', 'url': None, 'force_fixed_keys': [], 'extra_fixed_fields': {'r_max': 4.0}, 'include_frames': None, 'root': 'results/GeSe2'},
...   positional_args = {'type_mapper': <nequip.data.transforms.TypeMapper object at 0x2b9f505d7490>})
Traceback (most recent call last):
  File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/utils/auto_init.py", line 232, in instantiate
    instance = builder(**positional_args, **final_optional_args)
  File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/data/dataset.py", line 681, in __init__
    super().__init__(
  File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/data/dataset.py", line 123, in __init__
    super().__init__(root=root, transform=type_mapper)
  File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/utils/torch_geometric/dataset.py", line 90, in __init__
    self._process()
  File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/utils/torch_geometric/dataset.py", line 175, in _process
    self.process()
  File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/data/dataset.py", line 269, in process
    data_list = [
  File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/data/dataset.py", line 270, in <listcomp>
    constructor(**{**{f: v[i] for f, v in fields.items()}, **fixed_fields})
  File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/data/AtomicData.py", line 326, in from_points
    return cls(edge_index=edge_index, pos=torch.as_tensor(pos), **kwargs)
  File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/data/AtomicData.py", line 221, in __init__
    _process_dict(kwargs)
  File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/data/AtomicData.py", line 163, in _process_dict
    raise ValueError(
ValueError: atomic_numbers is a node field but has the wrong dimension torch.Size([72, 1])

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/gshs12051/anaconda3/envs/pytorch/bin/nequip-train", line 8, in <module>
    sys.exit(main())
  File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/scripts/train.py", line 74, in main
    trainer = fresh_start(config)
  File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/scripts/train.py", line 177, in fresh_start
    dataset = dataset_from_config(config, prefix="dataset")
  File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/data/_build.py", line 78, in dataset_from_config
    instance, _ = instantiate(
  File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/utils/auto_init.py", line 234, in instantiate
    raise RuntimeError(
RuntimeError: Failed to build object with prefix `dataset` using builder `NpzDataset`

Jul 19 '22 18:07 gshs12051

NpzDataset, and indeed the npz format from numpy itself, requires all arrays to be rectangular (i.e. all frames must have the same number of atoms). For variable number of atoms, and indeed most cases, we recommend using dataset: ase with the extxyz format.

This particular error indicates that the number of atomic_numbers you provide is inconsistent with the number of atoms, which is taken from the number of positions you provide.

See other discussions, for example:

https://github.com/mir-group/nequip/discussions/228
https://github.com/mir-group/nequip/discussions/137
https://github.com/mir-group/nequip/discussions/200

Jul 19 '22 18:07 Linux-cpp-lisp