xtb-python icon indicating copy to clipboard operation
xtb-python copied to clipboard

Memory Overflow of the ASE calculator

Open jan-janssen opened this issue 2 years ago • 3 comments

Describe the bug

I try to calculate the xtb energies of the QM9 database - 133885 structures. But at 6000 structures the memory usage exceeds 32GB, resulting in my calculation to crash. This can be solved by copying the ASE atoms object before every evaluation.

To Reproduce

Example workflow, which fails:

from ase.io import read
import matplotlib.pyplot as plt
import os
from tqdm import tqdm
from xtb.ase.calculator import XTB


def calc_energy(structure):
    structure.calc = XTB(method="GFN2-xTB")
    eng_pot = structure.get_potential_energy()
    return eng_pot


if __name__ == "__main__":
    path = "/home/janssen/pyiron/projects/2023-02-19-xtb-high-throughput/dsgdb9nsd"
    structure_lst = [read(os.path.join(path, f), format="xyz") for f in os.listdir(path)]
    potential_energy_lst = [calc_energy(structure=structure) for structure in tqdm(structure_lst)]
    print("Done")

Expected behaviour

This is the working example:

from ase.io import read
import matplotlib.pyplot as plt
import os
from tqdm import tqdm
from xtb.ase.calculator import XTB


def calc_energy(structure):
    structure.calc = XTB(method="GFN2-xTB")
    eng_pot = structure.get_potential_energy()
    return eng_pot


if __name__ == "__main__":
    path = "/home/janssen/pyiron/projects/2023-02-19-xtb-high-throughput/dsgdb9nsd"
    structure_lst = [read(os.path.join(path, f), format="xyz") for f in os.listdir(path)]
    potential_energy_lst = [calc_energy(structure=structure.copy()) for structure in tqdm(structure_lst)]
    print("Done")

In particular I changed the line:

< potential_energy_lst = [calc_energy(structure=structure) for structure in tqdm(structure_lst)]
> potential_energy_lst = [calc_energy(structure=structure.copy()) for structure in tqdm(structure_lst)]

Additional context

I tried both cache_api=True and cache_api=False still this does not fix the issue. A similar option would be great to achieve the same reset I get from copying the atoms object.

jan-janssen avatar Feb 19 '23 23:02 jan-janssen

you keep initializing new calc instances each time. structure.calc = XTB(method="GFN2-xTB") try to reuse the same calc instance will help. but this seems to indicate there is an issue with GC.

SingletC avatar Jun 15 '23 13:06 SingletC

Note that you modify each structure in place creating a new calculator object and associate it with the structure. The API objects cannot be cleaned up by the GC because references to the respective objects still exist.

awvwgk avatar Jun 15 '23 14:06 awvwgk

I can confirm reuse the calc object can avoid memory problems. The below code has no issue.

from ase.atoms import Atoms
from xtb.ase.calculator import XTB
atoms = Atoms('H2',positions=[[0,0,0],[0,0,1]])
atoms.calc = XTB(method="GFN2-xTB")
for _ in range(100000):
    atoms.rattle()  
    atoms.get_forces()

However, this code demonstrate there is indeed gc issue:

from ase.atoms import Atoms
from xtb.ase.calculator import XTB
atoms = Atoms('H2',positions=[[0,0,0],[0,0,1]])
for _ in range(100000):
    atoms.rattle()  
    atoms.calc = XTB(method="GFN2-xTB")
    atoms.get_forces()

Note that it is a slightly different approach compared to the op. the same atoms was used each time. The old instance of XTB calculator should be GCed. Although it is arguably not the best approach, it indeed consumes more and more memory in my test environment:

python3.8 
ase=3.22.1 
xtb==22.1 

an additional test to verify that is not ase's issue. The below code does not consume more and more memory.

from ase.atoms import Atoms
from ase.calculators.lj import LennardJones
atoms = Atoms('H2',positions=[[0,0,0],[0,0,1]])
for _ in range(10000):
    atoms.rattle()  
    atoms.calc = LennardJones()
    atoms.get_forces()

Correct me if I did something wrong.

SingletC avatar Jun 18 '23 23:06 SingletC