cuda-quantum
cuda-quantum copied to clipboard
Potential performance issue when doing np.array on statevector representation
Required prerequisites
- [x] Consult the security policy. If reporting a security vulnerability, do not report the bug using this form. Use the process described in the policy to report the issue.
- [x] Make sure you've read the documentation. Your issue may be addressed there.
- [x] Search the issue tracker to verify that this hasn't already been reported. +1 or comment there if it has.
- [x] If possible, make a PR with a failing test to give us a starting point to work on!
Describe the bug
When retrieving the statevector with cudaq.get_state(kernel), converting the result to an np.array seems to take a long time.
Steps to reproduce the bug
I have the following python file named test_np_array.py:
import sys
import cudaq
import argparse
import numpy as np
parser = argparse.ArgumentParser(description="np array timing script with --np-array and --qubits flags.")
parser.add_argument('--np-array', action='store_true', default=False,
help='Use NumPy array if this flag is set')
parser.add_argument('--qubits', type=int, default=1,
help='Number of qubits to use (default: 1)')
args = parser.parse_args()
cudaq.set_target('nvidia', option='mgpu')
print(f"Running on target {cudaq.get_target().name}")
qubit_count = args.qubits
print(f"With {qubit_count} qubits")
@cudaq.kernel
def kernel():
qubits = cudaq.qvector(qubit_count)
h(qubits[0])
for i in range(1, qubit_count):
x.ctrl(qubits[0], qubits[i])
mz(qubits)
# print(result)
re = cudaq.get_state(kernel)
if args.np_array:
print("Doing np.array:")
# Do something new here
sv = np.array(re)
Testing this on GH200 with 32 qubits, I see:
$ time python3 test_np_time.py --qubits 32
Running on target nvidia
With 32 qubits
real 0m15.617s
user 0m11.144s
sys 0m12.847s
$ time python3 test_np_time.py --qubits 32 --np-array
Running on target nvidia
With 32 qubits
Doing np.array:
real 0m20.968s
user 0m13.812s
sys 0m15.527s
Expected behavior
I would expect np.array to be quick in this scenario, adding a 5s delay seems excessive.
Is this a regression? If it is, put the last known working version (or commit) here.
Not Sure.
Environment
- CUDA-Q version: nvq++ Version cu12-0.10.0 (https://github.com/NVIDIA/cuda-quantum 857dd2ce0a783c32416af8fba8664ff30f9ddc47)
- CUDA-Q docker container: cu12-0.10.0 container
Suggestions
No response