Potential performance issue when doing np.array on statevector representation

Open mitchdz opened this issue 8 months ago • 0 comments

Required prerequisites

[x] Consult the security policy. If reporting a security vulnerability, do not report the bug using this form. Use the process described in the policy to report the issue.
[x] Make sure you've read the documentation. Your issue may be addressed there.
[x] Search the issue tracker to verify that this hasn't already been reported. +1 or comment there if it has.
[x] If possible, make a PR with a failing test to give us a starting point to work on!

Describe the bug

When retrieving the statevector with cudaq.get_state(kernel), converting the result to an np.array seems to take a long time.

Steps to reproduce the bug

I have the following python file named test_np_array.py:

import sys
import cudaq
import argparse
import numpy as np

parser = argparse.ArgumentParser(description="np array timing script with --np-array and --qubits flags.")

parser.add_argument('--np-array', action='store_true', default=False,
                    help='Use NumPy array if this flag is set')

parser.add_argument('--qubits', type=int, default=1,
                    help='Number of qubits to use (default: 1)')

args = parser.parse_args()


cudaq.set_target('nvidia', option='mgpu')

print(f"Running on target {cudaq.get_target().name}")
qubit_count = args.qubits

print(f"With {qubit_count} qubits")

@cudaq.kernel
def kernel():
    qubits = cudaq.qvector(qubit_count)
    h(qubits[0])
    for i in range(1, qubit_count):
        x.ctrl(qubits[0], qubits[i])
    mz(qubits)

#    print(result)
re = cudaq.get_state(kernel)
if args.np_array:
    print("Doing np.array:")
    # Do something new here
    sv = np.array(re)

Testing this on GH200 with 32 qubits, I see:

$ time python3 test_np_time.py --qubits 32
Running on target nvidia
With 32 qubits

real    0m15.617s
user    0m11.144s
sys     0m12.847s
$ time python3 test_np_time.py --qubits 32 --np-array
Running on target nvidia
With 32 qubits
Doing np.array:

real    0m20.968s
user    0m13.812s
sys     0m15.527s

Expected behavior

I would expect np.array to be quick in this scenario, adding a 5s delay seems excessive.

Is this a regression? If it is, put the last known working version (or commit) here.

Not Sure.

Environment

CUDA-Q version: nvq++ Version cu12-0.10.0 (https://github.com/NVIDIA/cuda-quantum 857dd2ce0a783c32416af8fba8664ff30f9ddc47)
CUDA-Q docker container: cu12-0.10.0 container

Suggestions

No response

Apr 09 '25 16:04 mitchdz