fairchem main.py issues/improvements for mass inference with an ase-db with a checkpoint

It isn't obvious how to do mass inference on an ase-db with main.py.

Something like:

python main.py --mode predict --checkpoint gnoc_oc22_oc20_all_s2ef.pt --task.dataset=ase_db --test_dataset.src=data.db

should be sufficient I think, but this fails because --config-yml is a required argument. Given the checkpoint, I think it should not be required, at best it duplicates the model information, and at worst could be inconsistent with what is in the checkpoint. For predictions, it is hard to see why you should change the model.

Even when I make a config.yml file though, it appears you have to populate both train and test datasets

                           'dataset.train.a2g_args.r_energy': False,
                           'dataset.train.a2g_args.r_forces': False,
                            # Test data - prediction only so no regression
                           'dataset.test.src': 'data.db',
                           'dataset.test.a2g_args.r_energy': False,
                           'dataset.test.a2g_args.r_forces': False,

                          })

or you get an error

  File "/home/jovyan/shared-scratch/jkitchin/tutorial/ocp-tutorial/fine-tuning/ocp/ocpmodels/trainers/base_trainer.py", line 344, in load_datasets
    if self.normalizer.get("normalize_labels", False):
AttributeError: 'NoneType' object has no attribute 'get'

This also doesn't make sense to me, I think you should only need to specify the source you want to make predictions from.

I guess this isn't very specific to ase-db, and also applies to other data sources like lmdb.

Jul 21 '23 15:07 jkitchin

Adding to OCP 2.0 planned changes #520

Jul 27 '23 16:07 mshuaibii

This issue has been marked as stale because it has been open for 30 days with no activity.

Aug 27 '23 00:08 github-actions[bot]

@lbluque Can we close this? Do we have an equivalent option for the new cli now?

Jul 10 '25 13:07 zulissimeta