pymatgen
pymatgen copied to clipboard
Faster structure matching via `StructureMatcher` (and by extension, `group_structures`)
Is your feature request related to a problem? Please describe.
Using StructureMatcher
repeatedly causes a large overhead (in my use-case, ~40 hrs to check the performance of a generative benchmark via matbench-genmetrics
).
Describe the solution you'd like
Explore speeding up bottlenecks in the StructureMatcher
algorithm, pre-screening, leveraging GPU for distributed calculation of matches, etc.
Describe alternatives you've considered
These are in active discussion at https://github.com/sparks-baird/matbench-genmetrics/issues/9, e.g. AFLOW's XtalFinder
, dscribe
kernels
Additional context
As a follow-up to @mkhorton 's https://github.com/sparks-baird/matbench-genmetrics/issues/9#issuecomment-1199738568, here are some flame graphs on repeated evaluations of StructureMatcher().fit(...)
:
%pip install matbench-genmetrics
from mp_time_split.utils.gen import DummyGenerator
from matbench_genmetrics.core import MPTSMetrics
from tqdm import tqdm
mptm = MPTSMetrics(dummy=True, verbose=False)
for fold in tqdm(mptm.folds):
train_val_inputs = mptm.get_train_and_val_data(fold)
dg = DummyGenerator()
dg.fit(train_val_inputs)
gen_structures = dg.gen(n=50)
mptm.evaluate_and_record(fold, gen_structures)
print(mptm.recorded_metrics)
@mkhorton figured we could chat about bottlenecks and the potential for precomputing things and speedups. Here are some of my initial thoughts:
- we should be able to run
_preprocess
on all structures once and short-circuit it during later match matrix calculations - precompute several common supercells for each structure and use a lookup instead of generating it on the fly. Not exactly clear to me how to implement that in: https://github.com/materialsproject/pymatgen/blob/6d1eae91dfbf996ed472de49ce5fe0fae985884d/pymatgen/analysis/structure_matcher.py#L469-L492
-
_cart_dists
is probably the easiest to replace ornumba
-fy, e.g. by implementing ajit
-ed version of the Kabsch algorithm (@kjappelbaum)
Would be good to know how much memory is taken up by each calculation and whether or not these could be calculated in parallel on typical consumer GPUs.
Happy to move this to discussions if that's a better place.
Speed ups are always welcome. But I want to note a few things:
- Hashing is already done. E.g., if the fractional composition are not equal, two structures would immediately evaluate to not being equal without any matrix comparisons.
- Most of the code is already vectorized in numpy. Any GPU optimizations that work with numpy would work with structure matcher.
If any further optimizations are implemented, it should not be at the cost of code maintainability or support for simple single CPU machines.
I should add that as implemented, StructureMatcher is meant for simple one-off comparisons. When we actually run matching across large structure sets (e.g., entire ICSD), we use various tricks to speed it up. Pre-grouping by composition. Reducing to primitive cell. All these allow us to disable some of the more costly aspects of the general structure matcher, e.g., generating supercells. Of course, the only people who really care about performance in this regard are people working with large databases of materials. That's not a huge population to begin with.
Thanks for creating this issue @sgbaird.
I agree with @shyuep's general points, but note that I don't think this is the easiest code to maintain as it is currently written either (eg, we do use the custom Cython linear assignment code that's quite difficult to understand), so I hope that there's good scope for improvements!
While large-scale usage may be comparatively rare, I certainly think the number of people needing to do large numbers of comparisons is increasing, and it's often been a bottleneck.
For the specific suggestions:
we should be able to run
_preprocess
on all structures once and short-circuit it during later match matrix calculations
Yes, this seems like an obvious optimization if this is being repeated. It's probably not the most critical piece (since it only has to be done once for each structure) but perhaps get_niggli_reduced_lattice
can be accelerated too.
precompute several common supercells for each structure and use a lookup instead of generating it on the fly.
Think we'd have to try this out to see how useful it'd be in practice; it's not obvious to me how many supercells are needed to be generated.
_cart_dists is probably the easiest to replace or numba-fy, e.g. by implementing a jit-ed version of the Kabsch algorithm
Agreed, I do want to point out this for examples of where else optimized code in pymatgen lives (this was for neighbor finding). I note that numba
is both excellent and a really troublesome dependency, since it typically lags behind the latest numpy
and Python versions, so might err towards a Cython solution if feasible. Perhaps a simpler numpy
-vectorized solution might also be a better option -- as @shyuep notes this is done in several places already, but perhaps we've missed an opportunity.
Let me mention that StructureMatcher.group_structures
calls _preprocess
for each structure only once (https://github.com/materialsproject/pymatgen/pull/2490).
So, it is a little improvement, but using StructureMatcher.group_structures
will reduce _preprocess
part of the flame graph from O(n^2) to O(n), and ~150% faster as a whole.
That's fantastic @lan496! Sometimes the key to big speed improvements is lots of "small" optimizations, this is definitely appreciated :)