catalyst CUDA Quantum Interpreter: Provide semantics for `for

Context

Catalyst has recently added support for executing quantum programs in NVIDIA's CUDA Quantum platform. For example, in the following code, we see two identical quantum programs. Both programs will execute the RX gate with parameter a, and will return the state of the system.

import pennylane as qml
from catalyst import qjit
from catalyst.cuda import qjit as cjit, SoftwareQQPP

@qjit
@qml.qnode(qml.device("lightning.qubit", wires=1))
def foo(a):
   qml.RX(a, wires=0)
   return qml.state()

@cjit
@qml.qnode(qml.device(SoftwareQQPP(wires=1)))
def bar(a):
   qml.RX(a, wires=0)
   return qml.state()

These equivalent quantum programs are running on different simulators. The first one, has been specified to run in the lightning.qubit simulator, while the second one has been specified to run in the qpp-cpu simulator

These equivalent quantum programs are written using the PennyLane's API. However, in order to execute these programs in the qpp-cpu simulator, we first need to translate them into NVIDIA's CUDA Quantum Python API to describe quantum programs. The program above written in NVIDIA's CUDA Quantum Python API could look like the following: import cudaq

def bar(a):
  kernel = cudaq.make_kernel()
  qreg = kernel.qalloc(1)
  qubit0 = qreg[0]
  kernel.rx(a, qubit0)
  return cudaq.get_state(kernel)

Goal

Support for translating quantum programs written in PennyLane's API into NVIDIA's CUDA Quantum Python API is limited at the moment. In particular, we don't have support for translating PennyLane's for loops statements in CUDA Quantum. Here is how one would express conditional statements and for loops in PennyLane.

        @qjit()
        @qml.qnode(qml.device(backend, wires=6))
        def circuit(n: int):
            qml.Hadamard(wires=0)

            @for_loop(0, n - 1, 1)
            def loop_fn(i):
                qml.CNOT(wires=[i, i + 1])

            loop_fn()
            return qml.state()

CUDA Quantum's Python API also allows users to specify for loops.

Instead the above program should be translated to the following CUDA Quantum's Python API calls.

def circuit(n: int):

  kernel = cudaq.make_kernel()
  qreg = kernel.qalloc(7)
  qubit0 = qreg[0]
  kernel.h(qubit0)
  def loop(index):
    qubit_i = qreg[index]
    qubit_i_plus_1 = qreg[index + 1]
    kernel.cx(qubit_i, qubit_i_plus_1)
  kernel.for_loop(start=0, stop=n-1, function=loop)
  return cudaq.get_state(kernel)

Technical details

PennyLane's API calls are converted to CUDA Quantum's Python API via a custom JAX interpreter. found in catalyst.cuda.catalyst_to_cuda_interpreter.py.
You will need to implement the semantics for the Catalyst's JAX primitive for_p.
Write a function that takes an InterprereterContext and a for_p equation and checks for the parameter to for_p.
Construct a kernel.for_loop call that matches the semantics of for_p.
You may want to construct a JAX primitive for for_loop similar to other CUDA Quantum's JAX primitives found in catalyst.cuda.catalyst.primitives along with convenience functions.

Feb 16 '24 20:02 erick-xanadu

Hi Erick, I have a question about the test cases in catalyst/frontend/test/pytest/test_cuda_integration.py.

I'm using the latest CUDA Quantum Docker container to run cudaq, and I built the latest main branch inside the container, the build process finished without any issue. However, when I try to run test_cuda_integration.py, the following two test cases failed consistently across different trials:

FAILED test_cuda_integration.py::TestCudaQ::test_control_ry - AssertionError: 
FAILED test_cuda_integration.py::TestCudaQ::test_swap - AssertionError:

I wrote a standalone test for the Hadamard gate, it turns out cudaq backend's H gate produces a different result than lightning backend's H gate. Here's my minimal example to reproduce this issue:

import pennylane as qml
from numpy.testing import assert_allclose

import catalyst
from catalyst import qjit
@qml.qnode(qml.device("lightning.qubit", wires=6))
def circuit_lightning():
    qml.Hadamard(wires=0)
    return qml.state()
from catalyst.cuda import SoftwareQQPP

@qml.qnode(SoftwareQQPP(wires=6))
def circuit():
    qml.Hadamard(wires=0)
    return qml.state()

cuda_compiled = catalyst.cuda.qjit(circuit)
catalyst_compiled = qjit(circuit_lightning)
expected = catalyst_compiled()
observed = cuda_compiled()
assert_allclose(expected, observed)
print("works")

Mismatched elements: 2 / 64 (3.12%)
Max absolute difference: 0.70710678
Max relative difference: 1.
 x: array([0.707107+0.j, 0.      +0.j, 0.      +0.j, 0.      +0.j,
       0.      +0.j, 0.      +0.j, 0.      +0.j, 0.      +0.j,
       0.      +0.j, 0.      +0.j, 0.      +0.j, 0.      +0.j,...
 y: array([0.707107+0.j, 0.707107+0.j, 0.      +0.j, 0.      +0.j,
       0.      +0.j, 0.      +0.j, 0.      +0.j, 0.      +0.j,
       0.      +0.j, 0.      +0.j, 0.      +0.j, 0.      +0.j,...

Are these results expected?

Feb 23 '24 21:02 zzzDavid

@zzzDavid huh, this is interesting. I have been running the version available on PyPi. I'll have to take a closer look. Thanks for noting this :)

Feb 23 '24 22:02 erick-xanadu

Hi @zzzDavid, I looked a bit more into this. I ran

import cudaq

kernel = cudaq.make_kernel()
qreg = kernel.qalloc(7)
qubit = qreg[0]
kernel.h(qubit)
print(cudaq.get_state(kernel))

Both locally and in the latest docker container. I can confirm I saw the behaviour you describe. When running in the docker container, it outputs the following:

0.707107+0j 0.707107+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j

When running locally it outputs the following:

0.707107+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0.707107+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j 0+0j

I haven't been able to pinpoint the change in their repository that leads to this. Thanks for pointing this out.

Feb 26 '24 17:02 erick-xanadu

It might be this one: https://github.com/NVIDIA/cuda-quantum/pull/1082

Feb 26 '24 17:02 erick-xanadu

Hi @zzzDavid, just use version 0.6.0 for the time being, we will sort out the qubit ordering in a later PR. Thanks :)

Feb 26 '24 19:02 erick-xanadu

Thank you @erick-xanadu for investigating this!!! I have switched to 0.6.0 and it now works. I was traveling in the last few days, let me finish my PR today :)

Feb 29 '24 19:02 zzzDavid

catalyst
catalyst copied to clipboard

CUDA Quantum Interpreter: Provide semantics for `for_p`.

Context

Goal

Technical details

catalyst catalyst copied to clipboard

CUDA Quantum Interpreter: Provide semantics for `for_p`.

Context

Goal

Technical details

catalyst
catalyst copied to clipboard