qiskit-aer icon indicating copy to clipboard operation
qiskit-aer copied to clipboard

MPI segmentation fault with simple circuit on 1 node

Open mdepasca opened this issue 1 year ago • 9 comments

Informations

  • Qiskit Aer version: 0.12.0
  • Python version: 3.9
  • Operating system: SUSE Linux Enterprise Server 15 SP1

What is the current behavior?

We build qiskit-aer with MPI support (intelMPI) on an HPC system. Currently we are trying to run this simple test script

from qiskit import *
from qiskit.circuit.library import QuantumVolume
from qiskit.providers.aer import *
from qiskit.utils import algorithm_globals

consistent_seed_to_all_processes = 12345
algorithm_globals.random_seed = consistent_seed_to_all_processes

sim = AerSimulator(method='statevector', device='CPU', blocking_qubits=5)

shots = 100
depth = 3
qubits = 3
circuit = transpile(QuantumVolume(qubits, depth, seed=2),
                    backend=sim,
                    optimization_level=0)

print(circuit)

circuit.measure_all()
result = execute(circuit, sim, shots=shots,
                 blocking_enable=True, blocking_qubits=5).result()

dict = result.to_dict()
print(dict.keys())
meta = dict['metadata']
myrank = meta['mpi_rank']
print(myrank)

with the following resource:

  • 1 node
  • 16 MPI tasks

What we experience is a Segmentation Fault error from some or all the tasks (the discriminating factor is not clear) at the end of the script, see the partial output below

[...]
     ┌──────────┐┌──────────┐┌──────────┐
q_0: ┤0         ├┤0         ├┤0         ├
     │  su4_837 ││          ││          │
q_1: ┤1         ├┤  su4_262 ├┤  su4_110 ├
     └──────────┘│          ││          │
q_2: ────────────┤1         ├┤1         ├
                 └──────────┘└──────────┘
dict_keys(['backend_name', 'backend_version', 'date', 'header', 'qobj_id', 'job_id', 'status', 'success', 'results', 'metadata', 'time_taken'])
13
[...]
===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 51068 RUNNING AT i23r02c05s12
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================
[...]
===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 14 PID 51082 RUNNING AT i23r02c05s12
=   KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================

Steps to reproduce the problem

We built and installed qiskit-aer in an Anaconda 3 (2021.05) environment with the following dependencies:

  • cmake 3.21.4
  • gcc 11.2.0
  • openblas 0.3.18 build with gcc 11
  • intelMPI 3.1 build by gcc
  • mpi4py Python module
  • py-pybind11 Python module

and running

python ./setup.py bdist_wheel -- -DAER_MPI=True -DBUILD_TESTS=True
pip install dist/qiskit_aer-*.whl

Then we run the script as follow

srun -n 16 python 1_mpi_test_CPU.py

What is the expected behavior?

We expect the script to end with no Segmentation Fault errors.

Suggested solutions

None so far

mdepasca avatar Apr 21 '23 13:04 mdepasca

I found MPI issue and I posted PR #1808 I do not know this fix is related to this issue or not, but I could not reproduce the error with this PR.

By the way, in this example, blocking_qubits=5 is not correct because the number of qubit of circuit is 3 that is less than blocking qubits. If using 16 processes to parallelize the simulation, (number of qubits) - (blocking qubits) should be greater or equal 4.

doichanj avatar May 10 '23 09:05 doichanj

We performed the suggested changes and re-installed qiskit-aer following the acceptance of PR #1808.

Unfortunately, nothing changed when running our script: we are still receiving Segmentation Fault messages, both from Intel MPI and from Open MPI.

mdepasca avatar Jun 29 '23 13:06 mdepasca

WRT the first version of the script, I updated as follows:

  • I increased the number of qubits, form 3 to 12
  • I set the number of blocking qubits to qubits - 4
  • increased depth from 3 to 5
# ...
# consistent_seed_to_all_processes = 12345
# algorithm_globals.random_seed = consistent_seed_to_all_processes

qubits = 12
blockingQubits = qubits - 4

# ...

depth = 5

# ...

circuit.measure_all()
    result = execute(
        circuit, sim, shots=shots, blocking_enable=True, blocking_qubits=blockingQubits
    ).result()

and run on 16 processes. This has not helped

mdepasca avatar Jul 13 '23 14:07 mdepasca

I tested with the latest source code of Qiskit Aer, but I could not reproduce segmentation fault with the script with 16 processes / node. I tried changing some build options and parameters in the scripts but it runs correctly. Could you please provide debug trace?

doichanj avatar Aug 02 '23 05:08 doichanj

How would you suggest me to produce such debug trace?

mdepasca avatar Aug 08 '23 09:08 mdepasca

Stack trace can be obtained by using gdb with dumped core file. (by using bt command after reading core file) To get stack trace, please add -g compiler option, by adding one line below in CMakeLists.txt

set(AER_COMPILER_FLAGS "${AER_COMPILER_FLAGS} -g")

doichanj avatar Aug 09 '23 04:08 doichanj

Thank you. I understand I should have a core dump file; however that is actually not created by the seg-fault of the MPI ranks. Do you have any suggestion on how to get around this?

mdepasca avatar Aug 09 '23 12:08 mdepasca

Before running the program, set the core file size to unlimited. ulimit -c unlimited Then after segv occurs, core file can be loaded to gdb by using coredumpctl coredumpctl gdb -1 And type bt to get the trace.

doichanj avatar Aug 10 '23 03:08 doichanj

Unfortunately, I can't produce such file on the system I am on. It is an HPC system and sysadmin was very clear about the fact that systemd-coredump is not installed (and likely is not going to be installed, I may add).

mdepasca avatar Aug 24 '23 07:08 mdepasca