Please help me how to simulate statevector using multi machines
I want to simlate the statvector. Could you please provide me with links to examples or documentation like these? Any advice is welcome. Thanks in advance.
We have a samples folder that can help you start: https://github.com/NVIDIA/cuQuantum/tree/main/samples/custatevec/samples_mpi
Hi @CoolMLAI ,
Thank you for your inquiry. The example above is also applicable for multi-node runs.
We provide the sample script samples/bindings/custatevec/distributed_index_bit_swap_mpi.py for performing distributed state vector operations in Python using MPI.
For details about its design and usage, please refer to the cuQuantum SDK documentation: Distributed Index Bit Swap API. Our distributed state vector simulation API supports single-GPU/single-process configurations.
@ymagchi Thanks. actually, i have used for mgpu simulation of statevector and i noticed the above examples are using cuQuantum. what is the difference between cudaq and cuquantum, 😁? could you plz let me know where i can see the multi node examples using cudaq?
what is the difference between cudaq and cuquantum, 😁? could you plz let me know where i can see the multi node examples using cudaq?
CUDA-Q is quantum programming framework, while cuQuantum (cuStateVec library) provides quantum circuit simulator components accelerated by GPU. Please refer to CUDA-Q: Multi-GPU multi-node for multi-node runs.
We maintain a separate repository for CUDA-Q, so we recommend directing any detailed questions regarding CUDA-Q to that repository. https://github.com/NVIDIA/cuda-quantum/discussions
hi @ymagchi Thanks for your kind reply. i followed your guide but it seems like cudaq doesn't provide the multi node simulation for large qubit statevector.
i want to simulate the high qubit circuits using cuquantum. is it possible to do and which examples should i refer to? is there any direct examples?
Thanks.
i followed your guide but it seems like cudaq doesn't provide the multi node simulation for large qubit statevector.
CUDA-Q leverages cuQuantum as its backend solver and supports multi-node state vector simulations as well as cuQuantum. While it is possible to use cuQuantum, if you already have CUDA-Q code, I recommend starting with CUDA-Q.
Please note that CUDA-aware MPI must be enabled across nodes as a prerequisite. A Docker environment is also available: https://nvidia.github.io/cuda-quantum/latest/using/install/local_installation.html#distributed-computing-with-mpi.
Thanks, @ymagchi actually, i successfully launched the code on multi nodes but it takes very long time and even 2 hours later, it didn't finish. the gpt said that is due to the bandwidth and for my machine, the speed is 5GBps but gpt suggested 100GBps.
i didn't use docker but i wonder if i use docker env, then it can solve out this issue. or should i use clusters as LAN?
i didn't use docker but i wonder if i use docker env, then it can solve out this issue.
In general, the inter-node communication bandwidth is determined by the cluster and hardware configuration. From my perspective, using Docker is unlikely to significantly improve the bandwidth. I recommend running a simulation with a relatively small problem size to verify that the end-to-end workflow completes successfully.
Thanks very much, @ymagchi
i have one more question. i saw one issue mentioned the high qubit circuit sampling or simulation but what is the limit qubit number for cu-quantum?
The possible maximum problem size depends on the targeting cluster. For state vector simulations, n-qubits state vector requires (2 ** n) * 8 Bytes in single precision, (2 ** n) * 16 Bytes in double precision. For instance, 1 GPU with 180 GB of memory can hold up to 34 qubits in single precision.
Hi, @ymagchi Thanks for your support.
i tested cudaq to simulate the statevector but i found its index distribution is not correct in the case of high qubit circuit. so i would like to directly use cuquantum to simulate high qubit statevector using multi gpu. i leveraged the code distributed_index_bit_swap_mpi.py what you mentioned but it never stops the simulation. could you plz help me out what is wrong with me? Thanks.
I think it would be best to share a reproducer with CUDA-Q. The way state vectors are distributed among GPUs is consistent between CUDA-Q and the cuStateVec library.
As for the Python file, could you kindly share where exactly the execution stops and what kind of computing environment you are using? To run it properly, CUDA-Aware MPI would be required.
yes, i installed cuda-aware MPI and confirmed statevector are distributed on each gpu by checking the usage of gpu. but the python file stopped when applying gates operation. i can share my code and would you please let me know your email address? 🙂
I am still in the process of fully understanding the issue, and I would greatly appreciate it if you could kindly confirm whether the sample file works correctly when executed without any modifications.