pymatgen Faster structure matching via `StructureMatcher` (and by extension, `group

Is your feature request related to a problem? Please describe.

Using StructureMatcher repeatedly causes a large overhead (in my use-case, ~40 hrs to check the performance of a generative benchmark via matbench-genmetrics).

Describe the solution you'd like Explore speeding up bottlenecks in the StructureMatcher algorithm, pre-screening, leveraging GPU for distributed calculation of matches, etc.

Describe alternatives you've considered These are in active discussion at https://github.com/sparks-baird/matbench-genmetrics/issues/9, e.g. AFLOW's XtalFinder, dscribe kernels

Additional context As a follow-up to @mkhorton 's https://github.com/sparks-baird/matbench-genmetrics/issues/9#issuecomment-1199738568, here are some flame graphs on repeated evaluations of StructureMatcher().fit(...):

%pip install matbench-genmetrics

from mp_time_split.utils.gen import DummyGenerator
from matbench_genmetrics.core import MPTSMetrics
from tqdm import tqdm

mptm = MPTSMetrics(dummy=True, verbose=False)
for fold in tqdm(mptm.folds):
    train_val_inputs = mptm.get_train_and_val_data(fold)

    dg = DummyGenerator()
    dg.fit(train_val_inputs)
    gen_structures = dg.gen(n=50)

    mptm.evaluate_and_record(fold, gen_structures)

print(mptm.recorded_metrics)

@mkhorton figured we could chat about bottlenecks and the potential for precomputing things and speedups. Here are some of my initial thoughts:

we should be able to run _preprocess on all structures once and short-circuit it during later match matrix calculations
precompute several common supercells for each structure and use a lookup instead of generating it on the fly. Not exactly clear to me how to implement that in: https://github.com/materialsproject/pymatgen/blob/6d1eae91dfbf996ed472de49ce5fe0fae985884d/pymatgen/analysis/structure_matcher.py#L469-L492
_cart_dists is probably the easiest to replace or numba-fy, e.g. by implementing a jit-ed version of the Kabsch algorithm (@kjappelbaum)

Would be good to know how much memory is taken up by each calculation and whether or not these could be calculated in parallel on typical consumer GPUs.

Happy to move this to discussions if that's a better place.

Jul 29 '22 22:07 sgbaird

Speed ups are always welcome. But I want to note a few things:

Hashing is already done. E.g., if the fractional composition are not equal, two structures would immediately evaluate to not being equal without any matrix comparisons.
Most of the code is already vectorized in numpy. Any GPU optimizations that work with numpy would work with structure matcher.

If any further optimizations are implemented, it should not be at the cost of code maintainability or support for simple single CPU machines.

I should add that as implemented, StructureMatcher is meant for simple one-off comparisons. When we actually run matching across large structure sets (e.g., entire ICSD), we use various tricks to speed it up. Pre-grouping by composition. Reducing to primitive cell. All these allow us to disable some of the more costly aspects of the general structure matcher, e.g., generating supercells. Of course, the only people who really care about performance in this regard are people working with large databases of materials. That's not a huge population to begin with.

Jul 29 '22 23:07 shyuep

Thanks for creating this issue @sgbaird.

I agree with @shyuep's general points, but note that I don't think this is the easiest code to maintain as it is currently written either (eg, we do use the custom Cython linear assignment code that's quite difficult to understand), so I hope that there's good scope for improvements!

While large-scale usage may be comparatively rare, I certainly think the number of people needing to do large numbers of comparisons is increasing, and it's often been a bottleneck.

Jul 30 '22 00:07 mkhorton

For the specific suggestions:

we should be able to run _preprocess on all structures once and short-circuit it during later match matrix calculations

Yes, this seems like an obvious optimization if this is being repeated. It's probably not the most critical piece (since it only has to be done once for each structure) but perhaps get_niggli_reduced_lattice can be accelerated too.

precompute several common supercells for each structure and use a lookup instead of generating it on the fly.

Think we'd have to try this out to see how useful it'd be in practice; it's not obvious to me how many supercells are needed to be generated.

_cart_dists is probably the easiest to replace or numba-fy, e.g. by implementing a jit-ed version of the Kabsch algorithm

Agreed, I do want to point out this for examples of where else optimized code in pymatgen lives (this was for neighbor finding). I note that numba is both excellent and a really troublesome dependency, since it typically lags behind the latest numpy and Python versions, so might err towards a Cython solution if feasible. Perhaps a simpler numpy-vectorized solution might also be a better option -- as @shyuep notes this is done in several places already, but perhaps we've missed an opportunity.

Jul 30 '22 00:07 mkhorton

Let me mention that StructureMatcher.group_structures calls _preprocess for each structure only once (https://github.com/materialsproject/pymatgen/pull/2490). So, it is a little improvement, but using StructureMatcher.group_structures will reduce _preprocess part of the flame graph from O(n^2) to O(n), and ~150% faster as a whole.

Aug 04 '22 12:08 lan496

That's fantastic @lan496! Sometimes the key to big speed improvements is lots of "small" optimizations, this is definitely appreciated :)

Aug 08 '22 16:08 mkhorton

pymatgen
pymatgen copied to clipboard

Faster structure matching via `StructureMatcher` (and by extension, `group_structures`)

pymatgen pymatgen copied to clipboard

Faster structure matching via `StructureMatcher` (and by extension, `group_structures`)

pymatgen
pymatgen copied to clipboard