nequip
nequip copied to clipboard
Multiple Training / Validation Datasets🌟 [FEATURE]
Is your feature request related to a problem? Please describe. It is a common problem we run across internally that we wish to train models on partial datasets which are kept in separate files as well as combined models. For example, imagine a simple case.
- water-only.traj
- water-NaCl.traj
- water-KCl.traj
Ideally, we should be able to read these files independently in Nequip and sample from them as if they were one file. Making the file pairs quickly becomes unwieldy and expensive (in terms of space). Additionally, the cached datasets then also have to be regenerated and stored as well.
Describe the solution you'd like Simple extension to the dataloader syntax to accept a list of filenames, not just a single filename. The data would then be lumped together and used as normal. From the "ase" dataloader persepctive, this just involves appending multiple ase files together. As an alternative, ASE can also be extended to read multiple files potentially from a specialized filename, but I suspect that will get pushback from the devs (and not result in the proper caching on nequip's end).
dataset_file_name: /mnt/public/tgmaxson/datasets/7-4-24/train.traj # Single filename
or
dataset_file_name: # Multiple filenames
- /mnt/public/tgmaxson/datasets/7-4-24/train.traj
- /mnt/public/tgmaxson/datasets/7-2-24/train.traj