cuda-quantum How to get the unitary matrix of a given kernel

Greetings there,

Hope all are well. I would like to know how I can get the unitary matrix of a circuit in cudaq. I've seen .sample() and .get_state() for counts and statevector definitions, but haven't been able to find the unitary matrix definition. Thanks in advance!

Jun 16 '24 12:06 ACE07-Sev

Tentatively moving this to 0.9 for tracking, but this is not yet confirmed as planned.

Aug 07 '24 09:08 bettinaheim

+1 for this since we need it for a collaboration, thanks.

Aug 22 '24 13:08 zohimchandani

I would make a few suggestions for this in the interest of performance. To get the unitary we have to do tensor contraction, and it gets expensive real fast as we increase the entanglement. With that being said, I would suggest using sth like cotengra or contengrust to find optimal order of contraction. Frankly, anything by Dr. Gray is usually very good for this type of operation.

Dr. Nguyen mentioned these have to be accessible in C++, so I'll have a look for any alternatives for this in C++.

Ohh, and one thing. It would be great if you don't do this as we add gates. Some packages (I believe qiskit) do this which adds alot of overhead for deep circuits. I imagine it would be better to contract it at the end if the user wants a unitary matrix of the overall circuit. This also makes it easier to use JIT, or GPU-acceleration for performing this.

Aug 22 '24 14:08 ACE07-Sev

Greetings there,

Hope all are well. May I ask if there's an expected timeline for when this feature will be done?

Oct 29 '24 05:10 ACE07-Sev

You can use this hack in the meantime if it helps:

import cudaq 
import numpy as np
from typing import List


num_qubits = 2
input_state = [0.0] * (2**num_qubits)  #just input a zero state 
N = 2**num_qubits
U = np.zeros((N, N), dtype=np.complex128)
params = [1 ,2]

@cudaq.kernel
def kernel(params: List[float], input_state: List[complex]):

    qubits = cudaq.qvector(input_state)
    
    rx(params[0], qubits[0])
    ry(params[1], qubits[1])
    
    y.ctrl(qubits[0], qubits[1])
    

for j in range(N): 
    state_j    = np.zeros((N), dtype=np.complex128) 
    state_j[j] = 1.0
    
    U[:, j] = np.array(cudaq.get_state(kernel, params, state_j), copy=False)
    
print(U)

Oct 29 '24 08:10 zohimchandani

@zohimchandani Thank you!

Nov 04 '24 06:11 ACE07-Sev

A bit of an unrelated question, sorry (too small to open a ticket for). How can I pass a list of qubits here? It doesn't let me slice nor pass in a list:

# Create a `Kernel` that accepts a qubit as an argument.
# Apply an X-gate on that qubit.
target_kernel, qubit = cudaq.make_kernel(cudaq.qubit)
target_kernel.tdg(qubit)

# Create another `Kernel` that will apply `target_kernel`
# as a controlled operation.
kernel = cudaq.make_kernel()
qubits = kernel.qalloc(3)

# In this case, `control` performs the equivalent of a
# controlled-X gate between `control_qubit` and `target_qubit`.
kernel.control(target_kernel, qubits[:2], qubits[2])

print(cudaq.draw(kernel))

I want to add a controlled Tdg gate, with qubits 0 and 1 being the control indices, and qubit 2 being the target.

Nov 04 '24 08:11 ACE07-Sev

See this for an example and let us know if that helps.

It is recommended that you switch to the new way of creating kernels with the @cudaq.kernel decorator as shown in the example.

Nov 04 '24 15:11 zohimchandani

Ohh I have read that. I am wrapping cuda-quantum in my package and I use a UI similar to qiskit, hence why I need the addition of the gates to be in form of methods as opposed to all in one with the decorator. Is there a way around it?

Nov 06 '24 11:11 ACE07-Sev

Maybe the better question is how to do .ctrl with the way I'm creating the kernel. When I try circuit.x.ctrl it doesn't work given circuit.x is a partial.

I'd really appreciate some help in making the UI similar to what Qiskit, Cirq, or TKET have. This is more like Pennylane, which is what I'm trying to avoid. I'm not really comfortable with using decorated functions to represent circuits. Makes it really hard to wrap and use externally like I am.

Nov 27 '24 16:11 ACE07-Sev

import cudaq 

n_qubits = 2

kernel = cudaq.make_kernel()

qubits = kernel.qalloc(n_qubits)

kernel.h(qubits[0])

kernel.cx(qubits[0], qubits[1])

cudaq.draw(kernel)

result = cudaq.sample(kernel)

print(result)

Dec 03 '24 19:12 zohimchandani

Right, but then you'll get stuck with tdg for instance. There is no ctdg. It's no longer an issue as I added native decomposition for my library to use CX and U3, and you fortunately support those two.

On a more related note, I was wondering how I could get the unitary of a circuit when using the syntax you showed (the one I use hehe). Essentially when you define a circuit

import cudaq 

n_qubits = 2

kernel = cudaq.make_kernel()

qubits = kernel.qalloc(n_qubits)

kernel.h(qubits[0])

kernel.cx(qubits[0], qubits[1])

How to get this one's unitary?

Dec 06 '24 15:12 ACE07-Sev

Was this not included in the 0.9.0 release? I think this feature would be nice, happy to work on it as well @bettinaheim

Dec 11 '24 17:12 arulandu

It wasn't. I'd really want to see this ASAP too.

Dec 11 '24 18:12 ACE07-Sev

Apologies for the incorrect labeling. Unfortunately, we needed to defer it.

@arulandu It would be great if you want to work on it! A separate API very similar to draw makes sense here. Let me know if you want to give it a go, and we can give some more concrete pointers.

Dec 12 '24 09:12 bettinaheim

Greetings,

Hope all are well. May I ask if there has been any progress on this task?

Feb 22 '25 07:02 ACE07-Sev

I can work on this, but would it be possible to use quimb? Makes it easier to finish the work, and is more efficient than any manual approach given the significant optimizations quimb provides, i.e., optimal path contraction.

May 28 '25 09:05 ACE07-Sev

Shall I add this? I have done something similar for tequila, and for quick. quimb is really nice for these use-cases. Let me know and I'll add it in. One thing though, I can only add it to the python side.

May 28 '25 13:05 ACE07-Sev

Hi @ACE07-Sev - thanks for your interest in the issue. Generally, we try to avoid external dependencies unless the benefits significantly outweigh the cost (integration, maintenance, testing, etc.). quimb doesn't seem to match the criteria for this specific use case. Similarly, we try to maintain parity between C++ and Python features, instead of adding features to only one language.

Check out the Unitary Hack Instructions we added to the original description. In this case, it may be even be easier to start with the C++ support and create simple bindings for Python.

May 28 '25 16:05 nvidia-dobri

Right, but the point I was trying to make is that quimb takes care of the simulation (aka tensor contraction) given its high-level api, whereas if we do c++ we'd have to re-implement the whole thing from scratch and would not have the advantages that come from quimb, aka optimal path contraction, caching (huge time saver), and approximate tensor network representations (MPS/MPO save alot of time in simulation especially beyond 20 qubits or so).

May 28 '25 16:05 ACE07-Sev

The simulation part should be handled for you by the traceFromKernel call mentioned in the unitary hack summary, no?

May 28 '25 16:05 nvidia-dobri

This issue is still open and up for grabs for Unitary Hack.

Jun 06 '25 22:06 khalatepradnya

I've opened a draft with initial implementation of the feature. It should work with any 1-qubit gate but not with multiqubit gates as for now

Jun 10 '25 17:06 Randl