qiskit-aer icon indicating copy to clipboard operation
qiskit-aer copied to clipboard

Problem with Qiskit Aer parallelization using GPUs

Open dotslaser opened this issue 3 years ago • 4 comments

Versions:

qiskit-aer: 0.11.0 qiskit-terra: 0.21.0 mpirun (Open MPI): 4.0.3 python: 3.8.10

Description:

Hi, I'm trying to replicate the code example in the Qiskit Aer documentation (distributing the Quantum Volume algorithm using MPI and GPUs) as seen here: Running-with-multiple-gpus-andor-multiple-nodes

Code:

This is the code I'm running:

import qiskit from qiskit import IBMQ from qiskit.providers.aer import AerSimulator from qiskit import transpile from qiskit import execute, QuantumCircuit from qiskit.circuit.library import QuantumVolume

qubit=24 sim = AerSimulator(method='statevector', device='GPU') circ = transpile(QuantumVolume(qubit, 10, seed = 0)) circ.measure_all() result = execute(circ, sim, shots=100, blocking_enable=True, blocking_qubits=23).result()

print(result)

Error

This is the error I get:

Read -1, expected 67108864, errno = 14 *** Process received signal *** Signal: Segmentation fault (11) Signal code: Invalid permissions (2) Failing at address: 0x7f7988000000 Read -1, expected 67108864, errno = 14 *** Process received signal *** Signal: Segmentation fault (11) Signal code: Invalid permissions (2) Failing at address: 0x7fadb4000000 [ 0] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x43090)[0x7f79f420d090] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x43090)[0x7fae1d7c2090] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x18b8f5)[0x7f79f43558f5] [ 2] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x31c4)[0x7f79e26531c4] [ 3] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_send_request_schedule_once+0x1c6)[0x7f79e2635926] [ 4] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_ack+0x1a9)[0x7f79e262e429] [ 5] /lib/x86_64-linux-gnu/libc.so.6(+0x18b8f5)[0x7fae1d90a8f5] [ 2] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x31c4)[0x7fae0c0481c4] [ 3] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_send_request_schedule_once+0x1c6)[0x7fae0c02a926] [ 4] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_ack+0x1a9)[0x7fae0c023429] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x95)[0x7f79e2654ed5] [ 6] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x53a3)[0x7f79e26553a3] [ 5] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x95)[0x7fae0c049ed5] [ 6] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x53a3)[0x7fae0c04a3a3] [ 7] /lib/x86_64-linux-gnu/libopen-pal.so.40(opal_progress+0x34)[0x7fae0ef98854] [ 8] /lib/x86_64-linux-gnu/libopen-pal.so.40(ompi_sync_wait_mt+0xb5)[0x7fae0ef9f315] [ 9] /lib/x86_64-linux-gnu/libmpi.so.40(ompi_request_default_wait+0x228)[0x7fae0f42f9f8] [10] [ 7] /lib/x86_64-linux-gnu/libopen-pal.so.40(opal_progress+0x34)[0x7f79e59e3854] [ 8] /lib/x86_64-linux-gnu/libmpi.so.40(PMPI_Wait+0x58)[0x7fae0f472a88] [11] /home/ubuntu/.local/lib/python3.8/site-packages/qiskit/providers/aer/backends/controller_wrappers.cpython-38-x86_64-linux-gnu.so(+0x271e89)[0x7fae11905e89] [12] /home/ubuntu/.local/lib/python3.8/site-packages/qiskit/providers/aer/backends/controller_wrappers.cpython-38-x86_64-linux-gnu.so(+0x27019d)[0x7fae1190419d] [13] /home/ubuntu/.local/lib/python3.8/site-packages/qiskit/providers/aer/backends/controller_wrappers.cpython-38-x86_64-linux-gnu.so(+0xe4bb9)[0x7fae11778bb9] [14] /home/ubuntu/.local/lib/python3.8/site-packages/qiskit/providers/aer/backends/controller_wrappers.cpython-38-x86_64-linux-gnu.so(+0x4343dc)[0x7fae11ac83dc] [15] /home/ubuntu/.local/lib/python3.8/site-packages/qiskit/providers/aer/backends/controller_wrappers.cpython-38-x86_64-linux-gnu.so(+0x4362c4)[0x7fae11aca2c4] /lib/x86_64-linux-gnu/libopen-pal.so.40(ompi_sync_wait_mt+0xb5)[0x7f79e59ea315] [ 9] [16] /home/ubuntu/.local/lib/python3.8/site-packages/qiskit/providers/aer/backends/controller_wrappers.cpython-38-x86_64-linux-gnu.so(+0x436ce9)[0x7fae11acace9] /lib/x86_64-linux-gnu/libmpi.so.40(ompi_request_default_wait+0x228)[0x7f79e5e7a9f8] [10] /lib/x86_64-linux-gnu/libmpi.so.40(PMPI_Wait+0x58)[0x7f79e5ebda88] [11] /home/ubuntu/.local/lib/python3.8/site-packages/qiskit/providers/aer/backends/controller_wrappers.cpython-38-x86_64-linux-gnu.so(+0x271e89)[0x7f79e8350e89] [12] /home/ubuntu/.local/lib/python3.8/site-packages/qiskit/providers/aer/backends/controller_wrappers.cpython-38-x86_64-linux-gnu.so(+0x27019d)[0x7f79e834f19d] [13] /home/ubuntu/.local/lib/python3.8/site-packages/qiskit/providers/aer/backends/controller_wrappers.cpython-38-x86_64-linux-gnu.so(+0xe4bb9)[0x7f79e81c3bb9] [14] /home/ubuntu/.local/lib/python3.8/site-packages/qiskit/providers/aer/backends/controller_wrappers.cpython-38-x86_64-linux-gnu.so(+0x4343dc)[0x7f79e85133dc] [15] /home/ubuntu/.local/lib/python3.8/site-packages/qiskit/providers/aer/backends/controller_wrappers.cpython-38-x86_64-linux-gnu.so(+0x4362c4)[0x7f79e85152c4] [16] /home/ubuntu/.local/lib/python3.8/site-packages/qiskit/providers/aer/backends/controller_wrappers.cpython-38-x86_64-linux-gnu.so(+0x436ce9)[0x7f79e8515ce9] [17] /home/ubuntu/.local/lib/python3.8/site-packages/qiskit/providers/aer/backends/controller_wrappers.cpython-38-x86_64-linux-gnu.so(+0xe5cc1)[0x7f79e81c4cc1] [18] /home/ubuntu/.local/lib/python3.8/site-packages/qiskit/providers/aer/backends/controller_wrappers.cpython-38-x86_64-linux-gnu.so(+0x43dbce)[0x7f79e851cbce] [17] /home/ubuntu/.local/lib/python3.8/site-packages/qiskit/providers/aer/backends/controller_wrappers.cpython-38-x86_64-linux-gnu.so(+0xe5cc1)[0x7fae11779cc1] [18] /home/ubuntu/.local/lib/python3.8/site-packages/qiskit/providers/aer/backends/controller_wrappers.cpython-38-x86_64-linux-gnu.so(+0x43dbce)[0x7fae11ad1bce] [19] /home/ubuntu/.local/lib/python3.8/site-packages/qiskit/providers/aer/backends/controller_wrappers.cpython-38-x86_64-linux-gnu.so(+0x43f420)[0x7fae11ad3420] [20] /home/ubuntu/.local/lib/python3.8/site-packages/qiskit/providers/aer/backends/controller_wrappers.cpython-38-x86_64-linux-gnu.so(+0x43f6f2)[0x7fae11ad36f2] [21] /home/ubuntu/.local/lib/python3.8/site-packages/qiskit/providers/aer/backends/controller_wrappers.cpython-38-x86_64-linux-gnu.so(+0x1a280d)[0x7fae1183680d] [22] /home/ubuntu/.local/lib/python3.8/site-packages/qiskit/providers/aer/backends/controller_wrappers.cpython-38-x86_64-linux-gnu.so(+0x1a2f44)[0x7fae11836f44] [23] python(PyCFunction_Call+0x59)[0x5f3989] [24] [19] /home/ubuntu/.local/lib/python3.8/site-packages/qiskit/providers/aer/backends/controller_wrappers.cpython-38-x86_64-linux-gnu.so(+0x43f420)[0x7f79e851e420] [20] python(_PyObject_MakeTpCall+0x29e)[0x5f3e1e] [25] python[0x50b158] [26] python(PyObject_Call+0x1f7)[0x5f3547] [27] python[0x59d13c] [28] python(_PyObject_MakeTpCall+0x29e)[0x5f3e1e] [29] /home/ubuntu/.local/lib/python3.8/site-packages/qiskit/providers/aer/backends/controller_wrappers.cpython-38-x86_64-linux-gnu.so(+0x43f6f2)[0x7f79e851e6f2] [21] /home/ubuntu/.local/lib/python3.8/site-packages/qiskit/providers/aer/backends/controller_wrappers.cpython-38-x86_64-linux-gnu.so(+0x1a280d)[0x7f79e828180d] [22] /home/ubuntu/.local/lib/python3.8/site-packages/qiskit/providers/aer/backends/controller_wrappers.cpython-38-x86_64-linux-gnu.so(+0x1a2f44)[0x7f79e8281f44] [23] python(PyCFunction_Call+0x59)[0x5f3989] [24] python(_PyObject_MakeTpCall+0x29e)[0x5f3e1e] [25] python[0x50b158] python(_PyEval_EvalFrameDefault+0x58e6)[0x570266] *** End of error message *** [26] python(PyObject_Call+0x1f7)[0x5f3547] [27] python[0x59d13c] [28] python(_PyObject_MakeTpCall+0x29e)[0x5f3e1e] [29] python(_PyEval_EvalFrameDefault+0x58e6)[0x570266] *** End of error message ***

Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted.


mpirun noticed that process rank 1 with PID 0 on node ip-XX exited on signal 11 (Segmentation fault).

Things I've tried:

I tried creating a simple circuit to test the parallelization. It seems that the segmentation error happens when one puts a gate on the last qubit. Example: on a 24 qubit circuit, if I put a gate (like Hadamard) on the last qubit (qc.h(23)), I get a segmentation error. The other qubits seem unaffected, I can put arbitrary gates in the other qubits and it works.

Thanks a lot!!

dotslaser avatar Jul 07 '22 07:07 dotslaser

@jakelishman @doichanj I've tried different configurations with MPI and CUDA, and it seems to me the problem is in Qiskit Aer. Any algorithm fails to distribute with GPUs if the circuit's last qubit performs any operation (any gate on the last qubit seems to make the simulation fail). Can you fix this issue or maybe point out something I'm doing wrong? Thanks a lot!!

dotslaser avatar Jul 08 '22 11:07 dotslaser

I could not reproduce this issue. Please provide more info (number of processes, number of GPUs, GPU and CPU memory size, etc.) Could you test with smaller blocking_qubits value ? I think blocking_qubits=23 is too large for 24-qubits circuit, i.e. if you use 4 processes blocking_qubits should be less or equal to 22. (If you set 23 for 4 process, Qiskit Aer will abort with message like ERROR: [Experiment 0] cache blocking : blocking_qubits is to large to parallelize with 4 processes)

doichanj avatar Jul 13 '22 02:07 doichanj

Hi! I'm using g5.xlarge instances on AWS:

  • 4 vCPUs (AMD EPYC 7R32)
  • NVIDIA A10G Tensor Core (24 GB)
  • 16 GB RAM

Just in case there is some error in the way I'm building Qiskit Aer: I've installed CUDA 11.7 following these instructions and built Qiskit Aer using python ./setup.py bdist_wheel -- -DAER_MPI=True -DAER_THRUST_BACKEND=CUDA

I've tried lowering the blocking_qubits but it doesn´t seem to make any difference, I get the same segmentation error as in the other comment.

Thanks a lot!!

dotslaser avatar Jul 13 '22 06:07 dotslaser

In case it helps, these are all the steps I follow on a new AWS machine to install Qiskit Aer with GPU support:

NVIDIA toolkit installation

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600 wget https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda-repo-ubuntu2004-11-7-local_11.7.0-515.43.04-1_amd64.deb sudo dpkg -i cuda-repo-ubuntu2004-11-7-local_11.7.0-515.43.04-1_amd64.deb sudo cp /var/cuda-repo-ubuntu2004-11-7-local/cuda-*-keyring.gpg /usr/share/keyrings/ sudo apt-get update sudo apt-get -y install cuda

Qiskit AER compilation

sudo apt -y install build-essential libopenblas-dev git openmpi-bin python3-pip python-is-python3 git clone https://github.com/Qiskit/qiskit-aer cd qiskit-aer export PATH="/home/ubuntu/.local/bin:$PATH" pip install -r requirements-dev.txt source ~/.bashrc export CUDACXX=/usr/local/cuda-11.7/bin/nvcc python ./setup.py bdist_wheel -- -DAER_MPI=True -DAER_THRUST_BACKEND=CUDA pip install -U dist/qiskit_aer*.whl

Here is additional information on the GPU: image

Thanks a lot!! Tell me if you need more information :)

dotslaser avatar Jul 18 '22 07:07 dotslaser

I think this issue is same as issue #1583 I could not reproduce this one.

doichanj avatar Nov 11 '22 09:11 doichanj

Let me close this issue because of no response in more than two weeks. Please create a new issue when this issue should be fixed in your environment.

hhorii avatar Nov 29 '22 11:11 hhorii