GSoc 2025: Improve Performance Using Screening
Description
Use screening based on the overlap between basis functions to improve performance.
:books: Package Description and Impact
For large molecules, textbook expressions for quantities expanded in Gaussian basis functions (e.g., the electron density) or integrals based on Gaussian basis functions (e.g., the kinetic-energy integral) typically include many negligible terms. By screening out these terms, and only evaluating terms that are nonnegligible, the performance of GBasis can be greatly enhanced.
:construction_worker: What will you do?
In GBasis we provide a utility for screening these terms using their overlap, is_overlap_included. When this expression is small, one can also neglect other one-electron integrals. A generalization of this approach allows fast evaluation of spatial quantities and 2-electron integrals. Your main goal would be to screen 1-electron integrals and the evaluation of quantities at (grid) points using overlap screening and its generalization.
:checkered_flag: Expected Outcomes
- Adapt
is_overlap_includedto screen other one-electron integrals. - Extend
is_overlap_includedto three functions, which allows screening spatial evaluations. - Write tests to ensure correctness and assess performance.
- :trophy: An ambitious stretch goal is to implement screening of 2-electron integrals.
| Required skills | Python, OOP |
| Preferred skills | Be comfortable with math, physics. Experience with scientific programming, quantum chemistry would be huge plus |
| Project size | 350 hours, Large |
| Difficulty | Medium 😉 |
:raising_hand: Mentors
| Marco Martínez-González | mmg870630_at_gmail_dot_com | @marco-2023 |
| Esteban Vöhringer-Martinez | estebanvohringer_at_qcmmlab_dot_com | @evohringer |
| Paul Ayers | ayers_at_mcmaster_dot_ca | @PaulWAyers |
| Gabriela Sánchez-Díaz | sanchezg_at_mcmaster_dot_ca | @gabrielasd |
📝 Notes & References
- Supersedes #167
- Related to #183 , #121,
- Screening is discussed in Chapter 9 of Molecular Electronic Structure Theory.
- A recent(ish) discussion from the Ochsenfeld group
- Paul's notes on overlap screening
- Paul's notes on screening in general. This includes screening for 2-electron integrals, which would be an (ambitious!) stretch goal.
Hi, can I work on this?
Hi, can I work on this?
Of course. We have not yet heard from GSoC about approval (or not) for this year. Presuming you would like to do it as part of GSoC, you should look at the guidelines. We always have a few people who end up taking on a GSoC project without actually doing GSoC, so we are happy to support that too.
Note that we cannot promise anyone a GSoC position or anything of that sort. We can only say we support anyone who wishes to contribute to the best of our ability and capacity.
Hello Everyone,
I am Alaa, a senior CSE student at GIU Berlin. I am writing to express my interest in the Gbasis Project as part of the GSoC program.
I applied last year to this particular project, but it was not selected for funding, as I later found out. I am willing to continue from where I left off by delving more into the tutorials and documentation, contributing and resolving issues, and refining my proposal.
As I work towards it, I will provide updates and seek your feedback.
Thanks for your consideration, and I am excited to work with you!
Hi, I'm kunal , a CS student passionate about open-source and computational science. Excited to contribute to QC-Devs using Python, C++, and Julia for quantum chemistry and scientific computing.
Hey @PaulWAyers @gabrielasd @marco-2023 I’ve reviewed the Overlapscreening document by Paul and found it very insightful for understanding the theoretical basis of the overlap screening we’ll implement in GBasis. What i understood was basically it uses a exponential decay term to set distance cutoffs for neglecting small overlaps using the smallest exponents per shell and a threshold . To ensure i am on the right track I have a question:
- When extending is_overlap_included to handle three functions for spatial evaluations, should I focus on a similar exponential decay approach, or are there additional factors?
I am also currently in the process of reading the other documents attached and also the textbook attached . It would be helpful if you can provide any more insights on the goals of this project!
Thanks Divyansh Agrawal
Also following up on my previous message I just completed understanding the second notes attached where there is a mention of 2 - electron orbitals. I had a questions regrading this: For adapting is_overlap_included, the document suggests a multi-level approach (atom/shell/primitive). Should I implement all levels in GBasis, or prioritize one for the initial adaptation to one-electron integrals like kinetic energy?
The atom-level screening is the most important; after that level of screening one has the correct computational scaling.
Shell screening is the next most important. It dramatically reduces the prefactor.
Primitive screening may or may not be worth it: it reduces the cost (in terms of flops) but due to the compromises that may be incurred related to vectorization, it may not actually actually speed up the algorithm.
I would focus on the exponential factors. The polynomial factors are always sub-leading-order.
For atom-level screening, I plan to compute a single cutoff 𝑑𝐴𝐵=max s,t d As:Bt per atom pair, as described in your notes using the smallest exponents across all shells. Are there any edge cases where this might oversimplify things or is this a good starting point?
For atom-level screening, I plan to compute a single cutoff 𝑑𝐴𝐵=max s,t d As:Bt per atom pair, as described in your notes using the smallest exponents across all shells. Are there any edge cases where this might oversimplify things or is this a good starting point?
This is a good starting point. There are always edge cases...the one I can think of is for very high angular momentum. But it is not so important for now and could always be treated later.
Thanks for your feedback !
I was also looking through the gbasis repository and I see that the current function implements shell-level screening for overlap integrals. It checks if screening is enabled via contractions_one.ovr_screen, and if so, it computes a cutoff distance using the smallest exponents from each shell and the tolerance. The cutoff formula used also matches Pauls notes on overlap screening. It then compares the distance between shell centers to this cutoff, returning True if the distance is less than the cutoff or False if greater and hence skipping it.
I have a rough plan on how to expand it:
1.Since atom-level screening sets the scaling by reducing the number of atom pairs processed, I’ll add a pre-check before the shell-level logic. This will be done using the dAB formula.
-
Since the current shell-level screening implements only the overlap integral I’ll expand it to handle other one-electron integrals like kinetic energy or moments.
-
For the part of the project where we need to check three shells at once (like when calculating electron density at specific points in space), I’ll make a new version of the function called something like is_overlap_included_three. Instead of just checking two shells, it’ll look at all three pairs: shell 1 with shell 2, shell 1 with shell 3, and shell 2 with shell 3. For each pair, I’ll use the same distance cutoff formula as before and even if one pair has greater than cutoff distance we can not consider it.
-
We could next focus on edge cases and also implementing screen of two electron integrals.
This is a very rough sketch of what can be done . I would appreciate your insights on this and how it can be improved!
I think this is a reasonable strategy. The right way to handle step 3 may be different; we would need to think about it. Having a clean simple code for 2-overlap screening is very sensible; it might be that beyond 2-fold screening we should write arbitrary-order screening.
Ahh right I see you mean about the idea of using pairwise cutoffs for three shells as it might not be the best long-term approach, like it might scale poorly if we had so look at more pairs so therefore arbitrary-order screening function could be more flexible for handling three or more shells, like for spatial evaluations or even the two-electron integral goal.
Regarding that I had a question:
For an arbitrary-order screening function would you suggest combining the exponential decays of all shells into a single metric (like a product or sum of exp(-alpha d^2) terms ) or should it factor in the grid point’s position for spatial evaluations to decide if the group contribution is significant?
The basic strategy could be thought of as a Gaussian screening, but put a very tight Gaussian on grid points. Each grid point only has contributions from a very limited number of basis-function-products. One way to do this would be to make a list of nonnegligible basis-function products, and then make another list indicating which basis function products are relevant to each grid point. (It may be better, in practice, to then cluster grid points that need the same sets of basis functions).
There will be a tension here between the effort expended in the screening, the effort expended in the computation, and the ease of vectorization. One doesn't have to do the very best job; it's important to be in the ballpark of optimal while retaining simplicity.
Hi , I am 5th year phd student working in quantum chemistry itself currently I am dealing with 2electron integrals and solving Time dependent Schordinger Equation for that I am using Pyscf for the integrals and i am using fortran, python and julia . Also, I am using Gaussian basis functions . It would be great if I can work on this project.
@KKirti0001 thanks for your interest. We are of course happy to have help from everyone, in accord with QC-Devs contributing guidelines.
Hi @PaulWAyers I was going through the notes and the discussion here, So basically, for every grid point , a radius search can be performed to find all products within that screening radius??
Yep, something like that. There are better ways to do it (using KDTrees) too.
Hi, can I work on this?
General strategy
The evaluation or observables and the computation of integrals throughout gbasis rely on the construct_array_contraction method. This one is redefined depending on the quantity to compute, but in all cases, it performs the evaluation of the desired "quantity" for one (e.g. density evaluation, basis evaluation), two (e.g. evaluation of overlap) ... contractions. The general idea is to add a screening step to this method for every case.
Technical Details:
The conversion rules for one-body terms are:
- The screening method is the same for all x-contraction based quantities where x indicates the number of contractions/indices required for the quantity.
- The general Idea would be to create, as a first instance,
functions, one for the one contraction quantities, one for the two .... etc. These functions should have as an additional argument a tolerance value. - The functions would then be used inside the corresponding
construct_array_contractionwith the appropriate tolerance values. - For this, a new module called
screening.pyshould be added.
Additional guidance.
For the overlap integrals, the screeing functionality is already added, see construct_array_contraction These instructions, if extracted, correspond to the function for screening all two-indices quantities.
Implementation Tasks
- [ ] Create basic module structure.
- [ ] Implement function for 1-index screening.
- [ ] Implement function for 2-index screening.
- [ ] Add screening for all evaluation and one-electron integrals.
Stretch goal:
- [ ] Implement function for 4-index screening.
- [ ] Add screening to two-electron integrals.
Expected Methods
- one_index_screening(contractions, tol)
- two_index_screening(contractions_one, contractions_two, tol)
- two_index_screening(contractions_one, contractions_two, tol)
Stretch goal:
- four_index_screening(...)