qiskit Estimator and Sampler caching

Summary

This PR refactors and introduces caching for the local implementations of Sampler and Estimator based on Statevector.

Details and comments

Caching avoids unnecessary and redundant deepcopies of the target QuantumCircuits which were previously triggered both for binding the parameters (even if circuit was not parametrized), and on Statevector construction. These updates show improvements in the runtime of certain problems, like computing the CH3 molecule, of more than 20x the original speed.

Jul 19 '22 04:07 pedrorrivero

Thank you for opening a new pull request.

Before your PR can be merged it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient.

While you're waiting, please feel free to review other open PRs. While only a subset of people are authorized to approve pull requests for merging, everyone is encouraged to review open pull requests. Doing reviews helps reduce the burden on the core team and helps make the project's code better for everyone.

One or more of the the following people are requested to review this:

@Qiskit/terra-core
@ajavadia
@ikkoham
@levbishop
@t-imamichi

Jul 19 '22 04:07 qiskit-bot

Pull Request Test Coverage Report for Build 2758096258

96 of 99 (96.97%) changed or added relevant lines in 4 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.01%) to 84.002%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
qiskit/primitives/estimator.py	26	27	96.3%
qiskit/primitives/statevector_primitive.py	18	19	94.74%
qiskit/primitives/utils.py	33	34	97.06%
<!--	Total:	96	99

Totals
Change from base Build 2756896283:	0.01%
Covered Lines:	55942
Relevant Lines:	66596

💛 - Coveralls

Jul 19 '22 05:07 coveralls

Thank you for the proposal of the performance improvement. Could you add a link to a script to reproduce the performance test? I think there are two ideas in this PR: one is to avoid parameter_binding if no parameter and the other is the lru_cache of the parameter binding. I think we need to evaluate the contributions independently. I agree to include the first idea, but I'm wondering the contribution of lru_cache because VQE assigns different values in general. At least we have to set the capacity of lru_cache, otherwise it may cause memory overflow with VQE with a lot of iterations.

https://docs.python.org/3.9/library/functools.html#functools.lru_cache

If maxsize is set to None, the LRU feature is disabled and the cache can grow without bound.

Jul 20 '22 06:07 t-imamichi

Thank you for the proposal of the performance improvement. Could you add a link to a script to reproduce the performance test?

Hi, thanks for looking over the submission. Here is a notebook I created showing a 50x speedup for one of our actual problems of interest.

https://github.com/calebj15/file-sharing/blob/main/time_test_caching_estimator.ipynb

Jul 20 '22 14:07 caleb-johnson

@t-imamichi @ikkoham

I think there are two ideas in this PR: one is to avoid parameter_binding if no parameter and the other is the lru_cache of the parameter binding.

These two things address the same issue in different places: deepcopies being triggered unnecessarily.

Parameter binding
Statevector construction

At least we have to set the capacity of lru_cache, otherwise it may cause memory overflow with VQE with a lot of iterations.

I agree, and was running some checks to come up with the right value. Circuits returned by random_circuit(num_qubits=100, depth=30) occupy roughly 1 MiB in memory (obtained by recursively calling sys.getsizeof). With this in mind, I would not recommend going over max_size=1024, since that would amount to a cache of roughly 1 GiB. What do you think?

I have already addressed everything else. Let me know 🙂

Jul 20 '22 16:07 pedrorrivero

The size of Statevector scales exponentially, so we cannot make 100 qubits Statevector like random_circuit(num_qubits=100, depth=30).

Jul 21 '22 02:07 ikkoham

The size of Statevector scales exponentially, so we cannot make 100 qubits Statevector like random_circuit(num_qubits=100, depth=30).

Fair point, it was just to estimate a memory allocation reference. random_circuit(num_qubits=25, depth=120) should give something roughly equivalent. In any case, if this is too much, it would only amount to increasing the cache maxsize which is currently set to 256.

Jul 21 '22 02:07 pedrorrivero

By caching, we can't free up resources beyond the pointer, so we end up caching huge data, is this correct?

How about len(self._circuits) instead of fixed number 256? Or, again, it may be nice to cache only if there are no parameters

Jul 21 '22 04:07 ikkoham

By caching, we can't free up resources beyond the pointer, so we end up caching huge data, is this correct?

~I think this might be correct, yes. At least I cannot think of a way of manually freeing up resources with the current implementation.~ This is, of course, until we delete the estimator object. Whether the amount of data is huge or not depends on the maxsize.

How about len(self._circuits) instead of fixed number 256?

I need to establish maxsize at the class level, not instance/object. Therefore I cannot access self.

Or, again, it may be nice to cache only if there are no parameters

Same issue here, I have to decide whether the function is going to be cached at the class level, and I do not know whether there are going to be parameters or not until I instantiate an object. I could have two separate methods and call one or the other depending on whether we have parameters or not... but that seems like an odd solution, I need to think about it. Maybe with that we could have two separate caches with different maxsize, keeping some reduced memory (e.g. maxsize=64) for the case with parameters, and a larger one for the case without parameters (e.g. maxsize=256).

On top of that, I do see benefits of caching with parameters. Some users may be requesting always the same finite set of parameters for certain application (e.g. to measure different observables in Estimator), in which case it would be very useful. You may ask then, why not bind them beforehand. In this scenario, the memory usage is going to be the same with or without caching so it is not really relevant, and it is still more flexible and convenient for the user to have automatic binding capabilities.

The issue with overflowing the memory in cases where we are not repeating parameters is addressed by decreasing maxsize. I am sure we can come up with a sensible bound.

Jul 21 '22 04:07 pedrorrivero

@ikkoham there are ways of clearing the cache. I would have to work out a solution for it though, I am not sure if that is something that we need to include in this PR.

https://docs.python.org/3/library/functools.html#functools.lru_cache

The decorator also provides a cache_clear() function for clearing or invalidating the cache.

Jul 21 '22 13:07 pedrorrivero

Thank you for your explanation.

Why do you want to cache at the class level? I feel like instance level is fine. (Is there any specific use case?)

I suppose it makes sense to cache Statevector when there are no parameters, but I don't see the point of caching it when there are parameters. I can't think of a use case where the same parameter is queried repeatedly. Parameters should be updated. The reference implementation is only an example of an interface implementation, and it is for tests (esp. algorithms) in Terra. If there is no such test, there is no need to improve performance.

Jul 22 '22 00:07 ikkoham

Why do you want to cache at the class level? I feel like instance level is fine.

The cache is stored at instance level, but its size declared at class level (unless we start implementing closures which will just cutter the code)

I can't think of a use case where the same parameter is queried repeatedly. Parameters should be updated.

Say we have one parametrized quantum circuit, a small set of n parameter bindings, and a plethora of observables m >> n. In this case, to get the expval for all n*m >> n combinations we will need n statevectors. With caching, we can do this by performing only ~ n deepcopies, whereas without it we would need to perform ~ n*m deepcopies. This is a rather likely scenario to occur, and a massive performance improvement even if just for testing (i.e. arbitrary factor m, up to orders of magnitude). Note that this is independent from the size of the cache s inasmuch as s ≥ n (i.e. s ≠ s(m))

Jul 22 '22 04:07 pedrorrivero

I investigated the bottleneck of Estimator with Caleb's script. I notice that we can avoid some parameter bindings invoked within Statevector and made a PR #8403. It achieves some speed-up though it's less than caching. It might be useful to combine both #8367 and #8403.

Jul 27 '22 09:07 t-imamichi

I have reconsidered this issue. Primitives could be more efficient if it cached probability distributions (state.probabilities(qargs=qargs)), and Estimator cached expectation values (np.real_if_close(state.expectation_value(observable))) and variances (sq_exp_val - expectation_value**2 if shots is not None). Especially for Estimator, I think it would be a big improvement in terms of memory size.

Jul 28 '22 07:07 ikkoham

@t-imamichi thanks for the PR! I will look at #8403 and combine any extra logic 🙂

@ikkoham caching expectation values would spoil the arbitrary increase in speed, since the cache would work on circuit-observable combinations instead of just on circuits (which is what introduces the deepcopying overhead). Consequently, for the same circuit but a different observable, we would still trigger unnecessary copies simply because the cache would not recognize the observable argument.

Aug 09 '22 12:08 pedrorrivero

Closing as this is nearly 2 years old and we don't plan on adding improvements to the V1 interface.

Apr 19 '24 01:04 ihincks

qiskit qiskit copied to clipboard

Estimator and Sampler caching

Summary

Details and comments

Pull Request Test Coverage Report for Build 2758096258

💛 - Coveralls

qiskit
qiskit copied to clipboard