Faster controlled unitary operations
What should we add?
A comparison of Qiskit vs Qrisp was given here: https://www.linkedin.com/posts/raphael-seidel-306449200_quantumcomputing-qrisp-quantumalgorithms-activity-7336696357149175808-3yM-/?rcm=ACoAAADYVgEBWEM9yILRx5xtdFtUxxdEiMg_kvk
The relatively slow performance of Qiskit has been traced down to the controlled-unitary calls in the PhaseEstimation circuit function here: https://github.com/Qiskit/qiskit/blob/25dec6a06381f02361b6c815953e766a55c672a7/qiskit/circuit/library/phase_estimation.py#L100
unitary.power(2**j) is fast but unitary.power(2**j).control() takes a long time.
We of course should improve our performance and quality of synthesis. I imagine that #14479 is going to be one of the major steps, though also the construction time of PhaseEstimation is likely to be slower than the new-in-2.0 method phase_estimation.
Fwiw, as with many claims the Qrisp team make loudly, some of the problem here is that they're doing something completely different to what Qiskit does (often, though perhaps not in this case, using Qiskit in a less-than-efficient manner, too), and then making a direct comparison. For example, here - they are trumpeting Qrisp's resource estimation - which is totally fine, and a nice feature! - and comparing it to the time of Qiskit synthesising a circuit. Qiskit doesn't offer ahead-of-time resource estimation, and it would be fine for them to say that, but claiming that they have a $10^9\times$ speedup over Qiskit is incredibly misleading, yet par for the course for them.
If they want to compare their resource-estimation strategies, they should compare to a library that offers resource estimation. I have no problem with them showing their nice improvement over Qiskit in the actual synthesis, nor bopping us for them having a feature we don't, but it grates on me when they make such loud and disingenuous comparisons.
Oh for sure the comparison is not a good one. However, in this case it did lead to the realization, and least on my part, that adding the control is very costly in the circuit construction itself.
It is for sure, and it's in large part because Qiskit does eager synthesis of controlled gates. That's not a good thing, though #14479 should be a step towards it being fixed by default.
We'll still always come off way worse in this sort of comparison, though, because we don't offer ahead-of-time resource estimation. Even with the synthesis of control deferred to compilation time, we still don't offer a resource-estimation method, so we'd still have to do the synthesis to "estimate" the resulting CX count. It'd be totally fair if they said that.
I don't know if/when Qiskit itself will offer resource estimation - I don't think we've got NISQ-y ahead-of-time estimation on our roadmap of planned features, in part because there are other libraries that offer it.
I doubt it would be useful, at least in my opinion; I am not running 1e5 2Q gates any time soon.
There are some very basic optimizations we ought to do in PauliEvolutionGate.control and .power to use the fact that we only have to control a single Pauli rotation to control the full evolution (instead of controlling every single gate) and that .power just has to multiply the time (instead of repeating the gates -- or worse computing the unitary and re-synthesizing a matrix power). #14581 implements these simple improvements and gives the following plot:
Mainly:
- we now have a better (non-exponential) gate count
- our runtime is signficantly reduced (the 1-qubit case looks like a blip)