SQD Tutorial: Adding regularization term during LUCJ init optimalization
URL to the relevant tutorial
https://quantum.cloud.ibm.com/docs/en/tutorials/sample-based-quantum-diagonalization
Select all that apply
- [x] new content request
- [ ] typo
- [ ] code bug
- [ ] out-of-date content
- [ ] broken link
- [ ] other
Describe the fix or the content request.
Adding the regularization term in the optimization scheme of the $t$-amplitudes significantly speeds up the initialization of the LUCJ ansatz:
The regularization term could also result in higher accuracy as the ffsim documentation suggests https://qiskit-community.github.io/ffsim/explanations/lucj.html
Minimal run on stretched nitrogen $R=2.1$ Å:
For new content requests - if the request is accepted, do you want to write the content?
I will write (or already have written) a draft of the proposed content
Thanks for opening this. I verified that adding the regularization option reduces the time to compute the ansatz circuit. You show a change from 2m to 23s. I found 2m to 5s with same value $0.1$ that you used.
Is the plot you show above from exactly the same experiment as the one that produced the plot in the tutorial, except for the addition of the regularization option? Assuming they are the same experiment. It looks to me like the energy estimate is about 3.5 times worse, (further from "chemical accuracy") with the regularization. Furthemore, the occupancy chart is quite different.
The example in the ffsim doc you linked uses regularization=0.01 rather than 0.1 as you do. I tried 0.01 as well, and the execution time increased (for me) from 5s to 35s. (this is a 16 thread machine.
Before considering approving this, I'd want to see, at a minimum, a careful comparison of what effect this change has on the outcome of the experiment. Even if you can find a value for the regularization that improves the energy estimate, my opinion is that depending on how the estimate varies with the regularization, this might be gaming the result in an uncontrolled way. Of course, there could be more arguments to be made for including regularization.
I do think this is an interesting direction to explore. But at present, the results don't improve the story in the tutorial.
Thanks for opening this. I verified that adding the
regularizationoption reduces the time to compute the ansatz circuit. You show a change from 2m to 23s. I found 2m to 5s with same value 0.1 that you used.Is the plot you show above from exactly the same experiment as the one that produced the plot in the tutorial, except for the addition of the
regularizationoption? Assuming they are the same experiment. It looks to me like the energy estimate is about 3.5 times worse, (further from "chemical accuracy") with the regularization. Furthemore, the occupancy chart is quite different.The example in the ffsim doc you linked uses
regularization=0.01rather than0.1as you do. I tried0.01as well, and the execution time increased (for me) from 5s to 35s. (this is a 16 thread machine.Before considering approving this, I'd want to see, at a minimum, a careful comparison of what effect this change has on the outcome of the experiment. Even if you can find a value for the regularization that improves the energy estimate, my opinion is that depending on how the estimate varies with the regularization, this might be gaming the result in an uncontrolled way. Of course, there could be more arguments to be made for including regularization.
I do think this is an interesting direction to explore. But at present, the results don't improve the story in the tutorial.
Sorry, I was doing it at the dissociation region, hence the poor convergence. I will try and demonstrate with a notebook.
Okay, so I have two separate runs, one without regularization, and one with regularization ($\lambda = 0.1$) @jlapeyre . This is for nitrogen in cc-pVDZ. It converges significantly faster with almost the exact same dimension of the subspace Hamiltonian, meaning that the ansatz itself turned out better. When you optimize the t2-amplitudes without regularization, you are essentially overfitting.
How would you like me to include this in the existing notebook? Should I just add a regularization term without explaining why, or should I show this numerical type of study as an extra section or something @jlapeyre ?
@kevinsung can you also take a look?
![]()
Okay, so I have two separate runs, one without regularization, and one with regularization ( λ = 0.1 ) @jlapeyre . This is for nitrogen in cc-pVDZ. It converges significantly faster with almost the exact same dimension of the subspace Hamiltonian, meaning that the ansatz itself turned out better. When you optimize the t2-amplitudes without regularization, you are essentially overfitting.
How would you like me to include this in the existing notebook? Should I just add a regularization term without explaining why, or should I show this numerical type of study as an extra section or something @jlapeyre ?
Is this data from a numerical simulation? So you actually computed the state vector (it would be around 70GB) and sampled bitstrings from it? Also, what is the bond length?
Okay, so I have two separate runs, one without regularization, and one with regularization ( λ = 0.1 ) [@jlapeyre](https://github.com/jlapeyre) . This is for nitrogen in cc-pVDZ. It converges significantly faster with almost the exact same dimension of the subspace Hamiltonian, meaning that the ansatz itself turned out better. When you optimize the t2-amplitudes without regularization, you are essentially overfitting. How would you like me to include this in the existing notebook? Should I just add a regularization term without explaining why, or should I show this numerical type of study as an extra section or something [@jlapeyre](https://github.com/jlapeyre) ?
Is this data from a numerical simulation? So you actually computed the state vector (it would be around 70GB) and sampled bitstrings from it? Also, what is the bond length?
Nope, this is SQD run on sampled bitstrings from real quantum hardware (ibm_fez). The bond length is $R=1.1 Å$, (equilibrium). Here is a comparison with the dissociation regime ($R = 2.1 Å$) as well. $\lambda = 0.1$ regularization applied on both:
Thanks. These plots answer some of my concerns.
without regularization, you are essentially overfitting.
My worry about tuning an ad-hoc parameter till success was misplaced.
Since this is a tutorial, I think something like this is good: "to prevent overfitting, we introduce a regularization term with ...". That would be a minimal explanation, and so would avoid distracting from the main point. On the other hand, showing results with and without regularization would make the effect of regularization more clear. If this is really an essential part of a realistic workflow, then I think showing the effect explicitly would be great.
I think this suggested change makes sense in some form. Of course, the author of ffsim @kevinsung should weigh in.
Since this is a tutorial, I think something like this is good: "to prevent overfitting, we introduce a regularization term with ...". That would be a minimal explanation, and so would avoid distracting from the main point. On the other hand, showing results with and without regularization would make the effect of regularization more clear. If this is really an essential part of a realistic workflow, then I think showing the effect explicitly would be great.
Okay, I have attached some tentative notebook with proposed changes along with example of two images of results that is going into the numerical study section. How should we proceed with this?
To prevent overfitting and significantly speeding up and increasing accuracy the optimization, we introduce a regularization term by setting
regularization=0.1as an additional argument.
Maybe this is a bit too much in one sentence? How about...
We add a regularization term to the objective function by passing the argument regularization=0.1 when constructing the spin-balanced UCJ ansatz. Regularization prevents overfitting, thereby increasing accuracy. It also makes constructing the ansatz much faster. (see The ...
I'm okay with adding the regularization. I don't think we need to add any prose to the notebook. A short comment in the code should suffice. Something like this
ucj_op = ffsim.UCJOpSpinBalanced.from_t_amplitudes(
t2=t2,
t1=t1,
n_reps=n_reps,
interaction_pairs=(alpha_alpha_indices, alpha_beta_indices),
# Setting optimize=True enables the "compressed" factorization
optimize=True,
# Enable regularization to speed up the optimization and potentially improve results
regularization=0.1,
# Limit the number of optimization iterations to prevent the code cell from running
# too long. Removing this line may improve results.
options=dict(maxiter=1000),
)
If we no longer need to limit the optimization iterations, then we can also delete the lines
# Limit the number of optimization iterations to prevent the code cell from running
# too long. Removing this line may improve results.
options=dict(maxiter=1000),
If the intended audience is more-or-less experts, then omitting prose is fine. It makes the tutorial a bit more like a journal article.
But really, the comment in the code does most of what you want.
Sorry, I've changed my mind on this. For the tutorial, I don't think we should add regularization for now. The reason is that I have some numerical results (soon to be published) showing that at least in the noiseless case, regularization can hurt the SQD energy. I think it deserves further study before we recommend using it for SQD.
To make the tutorial run faster, we can change the line
options=dict(maxiter=1000),
to use maxiter=100 or even 10.
Sorry, I've changed my mind on this. For the tutorial, I don't think we should add regularization for now. The reason is that I have some numerical results (soon to be published) showing that at least in the noiseless case, regularization can hurt the SQD energy. I think it deserves further study before we recommend using it for SQD.
I do agree that regularization can hurt the SQD in some cases @kevinsung , but for N2 in cc-pVDZ I've found it to work with an arbitrarily chosen $\lambda = 0.1$ across all bond lengths. It requires 5-6 less iterations to hit chemical accuracy with the settings I have used, which amounts to 12+ hours less of computational time on 48 CPUs (since the last iterations become super slow).
If you'd like my results and maybe want to consider them in your work, I'll gladly hand you them over. I can also make a more comprehensive study on this specifically, just contact me. An interesting idea could be to treat it as a hyperparameter that is tuned before running the optimization so we don't use arbitrary values. I think its interesting and could help in understanding and how to initialize properly in different cases.
We shouldn't encourage to "always regularize because always better", but in this tutorial I believe its good since it reduces optimization time and improves results.
I also think the last plot in the tutorial should be replaced or we should supply with an example with more iterations and samples per batch (for example from my HPC run), since the current plot in the tutorial is not really convincing that it is going towards the chemical accuracy (it just flattens out).
I do agree that regularization can hurt the SQD in some cases @kevinsung , but for N2 in cc-pVDZ I've found it to work with an arbitrarily chosen λ = 0.1 across all bond lengths.
If this is true in a noiseless simulation, then it would be more compelling. For example, I have found it not to be true for noiseless simulation in the 6-31G basis. I will try simulating the cc-pVDZ case, but it will take some time given the size of the system.
I do agree that regularization can hurt the SQD in some cases @kevinsung , but for N2 in cc-pVDZ I've found it to work with an arbitrarily chosen λ = 0.1 across all bond lengths.
If this is true in a noiseless simulation, then it would be more compelling. For example, I have found it not to be true for noiseless simulation in the 6-31G basis. I will try simulating the cc-pVDZ case, but it will take some time given the size of the system.
Ah nope, I've only done it on real quantum hardware samples. I'm going to try abit around with noiseless simulations and come back to you if there is anything interest.