cuQuantum Please help me how to simulate statevector using multi machines

I want to simlate the statvector. Could you please provide me with links to examples or documentation like these? Any advice is welcome. Thanks in advance.

Aug 13 '25 15:08 CoolMLAI

We have a samples folder that can help you start: https://github.com/NVIDIA/cuQuantum/tree/main/samples/custatevec/samples_mpi

Aug 13 '25 16:08 daniellowell

Hi @CoolMLAI ,

Thank you for your inquiry. The example above is also applicable for multi-node runs.

We provide the sample script samples/bindings/custatevec/distributed_index_bit_swap_mpi.py for performing distributed state vector operations in Python using MPI.

For details about its design and usage, please refer to the cuQuantum SDK documentation: Distributed Index Bit Swap API. Our distributed state vector simulation API supports single-GPU/single-process configurations.

Aug 13 '25 21:08 ymagchi

@ymagchi Thanks. actually, i have used for mgpu simulation of statevector and i noticed the above examples are using cuQuantum. what is the difference between cudaq and cuquantum, 😁? could you plz let me know where i can see the multi node examples using cudaq?

Aug 13 '25 23:08 CoolMLAI

what is the difference between cudaq and cuquantum, 😁? could you plz let me know where i can see the multi node examples using cudaq?

CUDA-Q is quantum programming framework, while cuQuantum (cuStateVec library) provides quantum circuit simulator components accelerated by GPU. Please refer to CUDA-Q: Multi-GPU multi-node for multi-node runs.

We maintain a separate repository for CUDA-Q, so we recommend directing any detailed questions regarding CUDA-Q to that repository. https://github.com/NVIDIA/cuda-quantum/discussions

Aug 13 '25 23:08 ymagchi

hi @ymagchi Thanks for your kind reply. i followed your guide but it seems like cudaq doesn't provide the multi node simulation for large qubit statevector.

i want to simulate the high qubit circuits using cuquantum. is it possible to do and which examples should i refer to? is there any direct examples?

Thanks.

Aug 17 '25 00:08 CoolMLAI

i followed your guide but it seems like cudaq doesn't provide the multi node simulation for large qubit statevector.

CUDA-Q leverages cuQuantum as its backend solver and supports multi-node state vector simulations as well as cuQuantum. While it is possible to use cuQuantum, if you already have CUDA-Q code, I recommend starting with CUDA-Q.

Please note that CUDA-aware MPI must be enabled across nodes as a prerequisite. A Docker environment is also available: https://nvidia.github.io/cuda-quantum/latest/using/install/local_installation.html#distributed-computing-with-mpi.

Aug 17 '25 14:08 ymagchi

Thanks, @ymagchi actually, i successfully launched the code on multi nodes but it takes very long time and even 2 hours later, it didn't finish. the gpt said that is due to the bandwidth and for my machine, the speed is 5GBps but gpt suggested 100GBps.

i didn't use docker but i wonder if i use docker env, then it can solve out this issue. or should i use clusters as LAN?

Aug 17 '25 22:08 CoolMLAI

i didn't use docker but i wonder if i use docker env, then it can solve out this issue.

In general, the inter-node communication bandwidth is determined by the cluster and hardware configuration. From my perspective, using Docker is unlikely to significantly improve the bandwidth. I recommend running a simulation with a relatively small problem size to verify that the end-to-end workflow completes successfully.

Aug 18 '25 00:08 ymagchi

Thanks very much, @ymagchi

i have one more question. i saw one issue mentioned the high qubit circuit sampling or simulation but what is the limit qubit number for cu-quantum?

Aug 18 '25 05:08 CoolMLAI

The possible maximum problem size depends on the targeting cluster. For state vector simulations, n-qubits state vector requires (2 ** n) * 8 Bytes in single precision, (2 ** n) * 16 Bytes in double precision. For instance, 1 GPU with 180 GB of memory can hold up to 34 qubits in single precision.

Aug 18 '25 17:08 ymagchi

Hi, @ymagchi Thanks for your support.

i tested cudaq to simulate the statevector but i found its index distribution is not correct in the case of high qubit circuit. so i would like to directly use cuquantum to simulate high qubit statevector using multi gpu. i leveraged the code distributed_index_bit_swap_mpi.py what you mentioned but it never stops the simulation. could you plz help me out what is wrong with me? Thanks.

Aug 22 '25 00:08 CoolMLAI

I think it would be best to share a reproducer with CUDA-Q. The way state vectors are distributed among GPUs is consistent between CUDA-Q and the cuStateVec library.

As for the Python file, could you kindly share where exactly the execution stops and what kind of computing environment you are using? To run it properly, CUDA-Aware MPI would be required.

Aug 22 '25 01:08 ymagchi

yes, i installed cuda-aware MPI and confirmed statevector are distributed on each gpu by checking the usage of gpu. but the python file stopped when applying gates operation. i can share my code and would you please let me know your email address? 🙂

Aug 22 '25 01:08 CoolMLAI

I am still in the process of fully understanding the issue, and I would greatly appreciate it if you could kindly confirm whether the sample file works correctly when executed without any modifications.

Aug 22 '25 17:08 ymagchi