xtb-python
                                
                                 xtb-python copied to clipboard
                                
                                    xtb-python copied to clipboard
                            
                            
                            
                        Memory Overflow of the ASE calculator
Describe the bug
I try to calculate the xtb energies of the QM9 database - 133885 structures. But at 6000 structures the memory usage exceeds 32GB, resulting in my calculation to crash. This can be solved by copying the ASE atoms object before every evaluation.
To Reproduce
Example workflow, which fails:
from ase.io import read
import matplotlib.pyplot as plt
import os
from tqdm import tqdm
from xtb.ase.calculator import XTB
def calc_energy(structure):
    structure.calc = XTB(method="GFN2-xTB")
    eng_pot = structure.get_potential_energy()
    return eng_pot
if __name__ == "__main__":
    path = "/home/janssen/pyiron/projects/2023-02-19-xtb-high-throughput/dsgdb9nsd"
    structure_lst = [read(os.path.join(path, f), format="xyz") for f in os.listdir(path)]
    potential_energy_lst = [calc_energy(structure=structure) for structure in tqdm(structure_lst)]
    print("Done")
Expected behaviour
This is the working example:
from ase.io import read
import matplotlib.pyplot as plt
import os
from tqdm import tqdm
from xtb.ase.calculator import XTB
def calc_energy(structure):
    structure.calc = XTB(method="GFN2-xTB")
    eng_pot = structure.get_potential_energy()
    return eng_pot
if __name__ == "__main__":
    path = "/home/janssen/pyiron/projects/2023-02-19-xtb-high-throughput/dsgdb9nsd"
    structure_lst = [read(os.path.join(path, f), format="xyz") for f in os.listdir(path)]
    potential_energy_lst = [calc_energy(structure=structure.copy()) for structure in tqdm(structure_lst)]
    print("Done")
In particular I changed the line:
< potential_energy_lst = [calc_energy(structure=structure) for structure in tqdm(structure_lst)]
> potential_energy_lst = [calc_energy(structure=structure.copy()) for structure in tqdm(structure_lst)]
Additional context
I tried both cache_api=True and cache_api=False still this does not fix the issue. A similar option would be great to achieve the same reset I get from copying the atoms object.
you keep initializing new calc instances each time.
structure.calc = XTB(method="GFN2-xTB")
try to reuse the same calc instance will help.
but this seems to indicate there is an issue with GC.
Note that you modify each structure in place creating a new calculator object and associate it with the structure. The API objects cannot be cleaned up by the GC because references to the respective objects still exist.
I can confirm reuse the calc object can avoid memory problems. The below code has no issue.
from ase.atoms import Atoms
from xtb.ase.calculator import XTB
atoms = Atoms('H2',positions=[[0,0,0],[0,0,1]])
atoms.calc = XTB(method="GFN2-xTB")
for _ in range(100000):
    atoms.rattle()  
    atoms.get_forces()
However, this code demonstrate there is indeed gc issue:
from ase.atoms import Atoms
from xtb.ase.calculator import XTB
atoms = Atoms('H2',positions=[[0,0,0],[0,0,1]])
for _ in range(100000):
    atoms.rattle()  
    atoms.calc = XTB(method="GFN2-xTB")
    atoms.get_forces()
Note that it is a slightly different approach compared to the op. the same atoms was used each time. The old instance of XTB calculator should be GCed. Although it is arguably not the best approach, it indeed consumes more and more memory in my test environment:
python3.8 
ase=3.22.1 
xtb==22.1 
an additional test to verify that is not ase's issue. The below code does not consume more and more memory.
from ase.atoms import Atoms
from ase.calculators.lj import LennardJones
atoms = Atoms('H2',positions=[[0,0,0],[0,0,1]])
for _ in range(10000):
    atoms.rattle()  
    atoms.calc = LennardJones()
    atoms.get_forces()
Correct me if I did something wrong.