Error when testing my model using config files
Dear Schnetpack developers,
I'm encountering this issue when testing my model using a config.yaml file
Traceback (most recent call last):
File "/projappl/bandeira/schnet_env/lib64/python3.12/site-packages/schnetpack/cli.py", line 179, in train
trainer.test(model=task, datamodule=datamodule, ckpt_path="best")
File "/usr/local/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py", line 775, in test
return call._call_and_handle_interrupt(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pytorch_lightning/trainer/call.py", line 47, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 105, in launch
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py", line 817, in _test_impl
results = self._run(model, ckpt_path=ckpt_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py", line 1012, in _run
results = self._run_stage()
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py", line 1049, in _run_stage
return self._evaluation_loop.run()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pytorch_lightning/loops/utilities.py", line 179, in _decorator
return loop_run(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 138, in run
batch, batch_idx, dataloader_idx = next(data_fetcher)
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pytorch_lightning/loops/fetchers.py", line 134, in __next__
batch = super().__next__()
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pytorch_lightning/loops/fetchers.py", line 61, in __next__
batch = next(self.iterator)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pytorch_lightning/utilities/combined_loader.py", line 341, in __next__
out = next(self._iterator)
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pytorch_lightning/utilities/combined_loader.py", line 142, in __next__
out = next(self.iterators[0])
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib64/python3.12/site-packages/torch/utils/data/dataloader.py", line 708, in __next__
data = self._next_data()
^^^^^^^^^^^^^^^^^
File "/usr/local/lib64/python3.12/site-packages/torch/utils/data/dataloader.py", line 1455, in _next_data
return self._process_data(data)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib64/python3.12/site-packages/torch/utils/data/dataloader.py", line 1505, in _process_data
data.reraise()
File "/usr/local/lib64/python3.12/site-packages/torch/_utils.py", line 733, in reraise
raise exception
KeyError: Caught KeyError in DataLoader worker process 3.
Original Traceback (most recent call last):
File "/usr/local/lib64/python3.12/site-packages/torch/utils/data/_utils/worker.py", line 349, in _worker_loop
data = fetcher.fetch(index) # type: ignore[possibly-undefined]
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib64/python3.12/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
~~~~~~~~~~~~^^^^^
File "/projappl/bandeira/schnet_env/lib64/python3.12/site-packages/schnetpack/data/atoms.py", line 270, in __getitem__
props = self._get_properties(
^^^^^^^^^^^^^^^^^^^^^
File "/projappl/bandeira/schnet_env/lib64/python3.12/site-packages/schnetpack/data/atoms.py", line 339, in _get_properties
row = conn.get(idx + 1)
^^^^^^^^^^^^^^^^^
File "/projappl/bandeira/schnet_env/lib64/python3.12/site-packages/ase/db/core.py", line 531, in get
raise KeyError('no match')
KeyError: 'no match'
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
I am using ase==3.25.0, schnetpack==2.1.1 and torch==2.6.0+cu124. Could you help me understand what is going on? I tried simply shifting row = conn.get(idx + 1) to row = conn.get(idx) and the model test worked. Can that be the source of issues in the future or cause a mismatch in the dataset indices?
Yours faithfully,
HI @MasterLucas,
it is hard to tell what is going on here based on this information, except that you seem to load an item from the database that does not exist. How did you generate the splits?
And could you please check the following:
- min and max id for train, val and test
- length of the database
The schnetpack split files are 0 based, but ase db is 1 based.