multi-area-model icon indicating copy to clipboard operation
multi-area-model copied to clipboard

simulation environment of multi node

Open jiaduxie opened this issue 4 years ago • 125 comments

Do you have a recommended configuration tutorial for the multi node nest simulation environment? It can also be the brief steps of environment configuration and the required installation package.

jiaduxie avatar Sep 10 '20 00:09 jiaduxie

Hi jiaduxie,

could you provide a little bit more details about the problems you are running in?

The Readme.md of this repository has already some points on how to set up the environment and run the model.

The exact steps you have to take depend on the cluster you are using. What kind of job scheduler does it use? Is Python 3 available? Does it have MPI support? Have you already installed NEST?

A rough sketch:

  1. Manually compile NEST on your cluster, make sure python and MPI is supported. Do not use the conda version (no MPI nor OpenMP support). Use an official release (NEST master has features which are not implemented in this repository yet). Depending on your cluster you need to load some packages (eg. Python, MPI...)

  2. Make sure to install all python packages listed in requirements.txt. Run: pip3 install -r requirements.txt If the cluster does not allow this, try: pip3 install --user -r requirements.txt

  3. Inform the job scheduler on your system and how to run the job You will need to copy the file config_template.py to config.py. Change base_path to the absolute path of the multi-area repository. Change data_path to the path where you want to store the output of the simulation. Adapt the jobscript_template to your system. If it is SLURM, then there is already an example in place which you will could uncomment and try to use. Make sure to also load all packages that a multi-node simulation requires (eg. MPI). Change submit_cmd from None to the command your job scheduler uses. For SLURM it is sbatch.

  4. Try to run run_example_fullscale.py Now you should be able to run python run_example_fullscale.py. This will set up the simulation environment, do all the preprocessing and finally submit the job to the cluster. Depending on your cluster you might want to change num_processes and local_num_threads.

If all of this works, you should be ready to run your own experiments.

I hope this helps! Best, Jari

jarsi avatar Sep 10 '20 10:09 jarsi

Oh,thinks Jari. I run the model in NEST of conda environment .But I installed the conda version of nest with MPI version,it also no?I try install NEST from source code,The MPI in NEST manually installed by myself conflict with the local MPI of the server?I'll ask you if you have any questions.

jiaduxie avatar Sep 10 '20 13:09 jiaduxie

I am only aware of a conda version of NEST which does not have MPI support. But maybe it exists.

To check whether your NEST version supports MPI and OpenMP, could you run in your environment the following command and post the output:

python -c "import nest; nest.Simulate(1.)"

My conda installed NEST gives me in the start_updating_ information that neither MPI nor OpenMP is available:

Sep 10 16:07:01 SimulationManager::start_updating_ [Info]: Number of local nodes: 0 Simulation time (ms): 1 Not using OpenMP Not using MPI

Concerning manual compilation. How did you try to compile NEST? Could you post what steps you have tried so far?

jarsi avatar Sep 10 '20 14:09 jarsi

I haven't started trying to compile manually. I run the following in my conda environment,and the output is as follows:

$python -c "import nest; nest.Simulate(1.)"

Creating default RNGs Creating new default global RNG -- N E S T -- Copyright (C) 2004 The NEST Initiative Version: nest-2.18.0 Built: Jan 27 2020 12:49:17

This program is provided AS IS and comes with NO WARRANTY. See the file LICENSE for details.

Problems or suggestions? Visit https://www.nest-simulator.org Type 'nest.help()' to find out more about NEST.

Sep 10 22:20:26 NodeManager::prepare_nodes [Info]: Preparing 0 nodes for simulation.

Sep 10 22:20:26 SimulationManager::start_updating_ [Info]: Number of local nodes: 0 Simulation time (ms): 1 Number of OpenMP threads: 1 Number of MPI processes: 1

Sep 10 22:20:26 SimulationManager::run [Info]: Simulation finished.

jiaduxie avatar Sep 10 '20 14:09 jiaduxie

It seems alright. Have you installed the packages from requirements.txt? Have you tired running a simulation?

jarsi avatar Sep 10 '20 14:09 jarsi

Yes,I have installed the packages from requirements.txt?Can you help me see if the command to execute multi-node simulation is like this?: mpirun -hostfile hostfile python run_example_downscaled.py

The hostfile is following: work0 slots = 2 work1 slots = 2

jiaduxie avatar Sep 25 '20 03:09 jiaduxie

I have no experience with hostfiles, but it looks reasonable to me. Have you adjusted num_processes and local_num_threads in the sim_dict? Have you tried running it? Did it work?

The run_example_downscaled.py is meant to be run on a local machine, for example a laptop. If you would like to experiment on a compute cluster you should exchange M.simulation.simulate() with start_job(M.simulation.label, submit_cmd, jobscript_template) (see run_example_fullscale.py) and additionally import:

from start_jobs import start_job
from config import submit_cmd, jobscript_template

In this case you need to invoke the script serially:

python run_example.py

The parallelized part is then specified in the jobscript_template in config.py.

jarsi avatar Sep 25 '20 07:09 jarsi

Hei,jarsi .If run a complete model on a cluster of two servers, about how much memory each machine needs to support?

jiaduxie avatar Oct 09 '20 11:10 jiaduxie

The model consumes approximately 1 TB of memory. So with two servers each server would need to provide 500 GB.

jarsi avatar Oct 09 '20 11:10 jarsi

Okay, thank you. Then,when you run the entire model, you use several servers and how much memory each is.

jiaduxie avatar Oct 09 '20 11:10 jiaduxie

Hei,jarsi.In what system are you running multiple nodes in parallel.My system is ubuntu, slurm configuration is not good.Do you have any guidance on configuring the environment?

jiaduxie avatar Oct 12 '20 07:10 jiaduxie

Hi, we do not set up the systems ourselves. We use for example JURECA from the Forschungszentrum Juelich It has everything we need already installed. What kind of system are you using?

jarsi avatar Oct 12 '20 09:10 jarsi

I am a server under linux system, the release version is ubuntu.In addition to running under JURECA, do you have your own running on a general server?

jiaduxie avatar Oct 12 '20 09:10 jiaduxie

Hi,jarsi,I am now simulating a small network for testing on two machines, and run it with the following command. It seems that the two machines run by themselves without interaction.

`mpirun.mpich -np 2 -host work0,work1 python ./multi_test.py`

In addition,Have you run his model of multi-area-model in your own cluster environment?

jiaduxie avatar Oct 28 '20 08:10 jiaduxie

This is weird. Have you adjusted the num_processes or local_num_threads variable in the sim_params dictionary? An example of how to do this is shown in the run_example_fullscale.py file. In your case you should set num_processes=2. These variables are needed in order to inform NEST about distributed computing.

Maybe you could also post what is in your multi_test.py file?

I have run the model on a local cluster. I usually just need to modify the run_example_fullscale.py and config.py to my own needs.

jarsi avatar Oct 28 '20 09:10 jarsi

multi_test.py:

from nest import * SetKernelStatus({"total_num_virtual_procs": 4}) pg = Create("poisson_generator", params={"rate": 50000.0}) n = Create("iaf_psc_alpha", 4) sd = Create("spike_detector", params={"to_file": True}) Connect(pg, [n[0]], syn_spec={'weight': 1000.0, 'delay': 1.0}) Connect([n[0]], [n[1]], syn_spec={'weight': 1000.0, 'delay': 1.0}) Connect([n[1]], [n[2]], syn_spec={'weight': 1000.0, 'delay': 1.0}) Connect([n[2]], [n[3]], syn_spec={'weight': 1000.0, 'delay': 1.0}) Connect(n, sd) Simulate(100.0)

jiaduxie avatar Oct 28 '20 09:10 jiaduxie

This is difficult for me to debug. On my machine I can run this without running into errors. It works with the conda installed nest (conda create --name nest_conda -c conda-forge 'nest-simulator=*=mpi_openmpi*' python) and with nest compiled from source. I suspect there might be a problem with the host file. Unfortunately I do not know a lot about those, usually the system administrators take care of this.

On you machine, are you using any resource manager such as e.g., SLURM, PBS/Torque, LSF, etc. Or are you responsible for defining everything correctly using hostfiles? What kind of system are you using?

jarsi avatar Oct 28 '20 11:10 jarsi

The cluster environment I use is composed of nine ordinary server machines. The system is Linux, and the release version number is debain.You run this model on a supercomputer, right? Have you ever run in your own environment? Is it necessary to install SLURM resource scheduling system?I also had a lot of problems in the process of installing SLURM, so I won't install it.

jiaduxie avatar Oct 28 '20 12:10 jiaduxie

It is not necessary to install SLURM. But I have most experience with it as all clusters I have used so far had SLURM installed. Installing a resource manager is not trivial and should be the job of a system admin, not the user. Do you have a system administrator you could ask for help? How do other people run distributed jobs on this cluster?

Could you also try the following commands and report whether something changes: mpiexec -np 2 -host work0,work1 python ./multi_test.py

mpirun -np 2 -host work0,work1 python ./multi_test.py

jarsi avatar Oct 28 '20 13:10 jarsi

Because my cluster environment here is composed of general servers, there is no resource scheduling system such as SLURM installed. It seems that the command you said can not complete the simulation well.

(pynest_mpi) work@lyjteam-server:~/xjd/nest_multi_test$ mpiexec -np 2 -host work0,work1 python multi_test.py bash: orted: command not found

ORTE was unable to reliably start one or more daemons. This usually is caused by:

  • not finding the required libraries and/or binaries on one or more nodes. Please check your PATH and LD_LIBRARY_PATH settings, or configure OMPI with --enable-orterun-prefix-by-default

  • lack of authority to execute on one or more specified nodes. Please verify your allocation and authorities.

  • the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base). Please check with your sys admin to determine the correct location to use.

  • compilation of the orted with dynamic libraries when static are required (e.g., on Cray). Please check your configure cmd line and consider using one of the contrib/platform definitions for your system type.

  • an inability to create a connection back to mpirun due to a lack of common network interfaces and/or no route found between them. Please check network connectivity (including firewalls and network routing requirements).


(pynest_mpi) work@lyjteam-server:~/xjd/nest_multi_test$ mpirun -np 2 -host work0,work1 python ./multi_test.py bash: orted: command not found

ORTE was unable to reliably start one or more daemons. This usually is caused by:

  • not finding the required libraries and/or binaries on one or more nodes. Please check your PATH and LD_LIBRARY_PATH settings, or configure OMPI with --enable-orterun-prefix-by-default

  • lack of authority to execute on one or more specified nodes. Please verify your allocation and authorities.

  • the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base). Please check with your sys admin to determine the correct location to use.

  • compilation of the orted with dynamic libraries when static are required (e.g., on Cray). Please check your configure cmd line and consider using one of the contrib/platform definitions for your system type.

  • an inability to create a connection back to mpirun due to a lack of common network interfaces and/or no route found between them. Please check network connectivity (including firewalls and network routing requirements).


jiaduxie avatar Oct 28 '20 13:10 jiaduxie

Just to make sure, you are using nest installed via conda, right?

What do the following commands give you: conda list which mpirun which mpiexec which mpirun.mpich

jarsi avatar Oct 28 '20 13:10 jarsi

Yes, I installed nest under conda.I seem to have installed it

(pynest_mpi) work@lyjteam-server:~/xjd/nest_multi_test$ conda list llvm-meta 7.0.0 0 conda-forge matplotlib 3.3.0 pypi_0 pypi mpi 1.0 openmpi conda-forge mpi4py 3.0.3 py38h246a051_1 conda-forge ncurses 6.2 he1b5a44_1 conda-forge nest-simulator 2.18.0 mpi_openmpi_py38h72811e1_7 conda-forge nested-dict 1.61 pypi_0 pypi numpy 1.19.1 py38h8854b6b_0 conda-forge openmp 7.0.0 h2d50403_0 conda-forge openmpi 4.0.4 hdf1f1ad_0 conda-forge openssh 8.3p1 h5957347_0 conda-forge openssl 1.1.1g h516909a_1 conda-forge pandas 1.1.0 py38h950e882_0 conda-forge

(pynest_mpi) work@lyjteam-server:~/xjd/nest_multi_test$ which mpirun /home/work/anaconda3/envs/pynest_mpi/bin/mpirun (pynest_mpi) work@lyjteam-server:~/xjd/nest_multi_test$ which mpiexec /home/work/anaconda3/envs/pynest_mpi/bin/mpiexec (pynest_mpi) work@lyjteam-server:~/xjd/nest_multi_test$ which mpirun.mpich

jiaduxie avatar Oct 28 '20 13:10 jiaduxie

Ok thanks, the output of the last command is missing.

Using conda list you can see that nest is linked against open MPI. This is one of many MPI libraries. the command mpirun.mpich, to my understanding, instructs mpi to use the mpich version of MPI. This is different from the open MPI version that nest is linked against. These two versions are not compatible, as we can also see when you use mpirun.mpich. Both mpiexec and mpirun are installed inside of your conda environment and should be compatible with nest. I don't understand why you get the error message when using these.

jarsi avatar Oct 28 '20 13:10 jarsi

Maybe you could also check the output of: mpirun --version mpirun.mpich --version

jarsi avatar Oct 28 '20 13:10 jarsi

(pynest_mpi) work@lyjteam-server:~/xjd/nest_multi_test$ which mpirun.mpich /usr/bin/mpirun.mpich (pynest_mpi) work@lyjteam-server:~/xjd/nest_multi_test$ mpirun --version mpmpirun (Open MPI) 4.0.4 Report bugs to http://www.open-mpi.org/community/help/ (pynest_mpi) work@lyjteam-server:~/xjd/nest_multi_test$ mpirun.mpich --version HYDRA build details: Version: 3.3a2 Release Date: Sun Nov 13 09:12:11 MST 2016 CC: gcc -Wl,-Bsymbolic-functions -Wl,-z,relro CXX: g++ -Wl,-Bsymbolic-functions -Wl,-z,relro F77: gfortran -Wl,-Bsymbolic-functions -Wl,-z,relro F90: gfortran -Wl,-Bsymbolic-functions -Wl,-z,relro Configure options: '--disable-option-checking' '--prefix=/usr' '--build=x86_64-linux-gnu' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--disable-dependency-tracking' '--with-libfabric' '--enable-shared' '--enable-fortran=all' '--disable-rpath' '--disable-wrapper-rpath' '--sysconfdir=/etc/mpich' '--libdir=/usr/lib/x86_64-linux-gnu' '--includedir=/usr/include/mpich' '--docdir=/usr/share/doc/mpich' '--with-hwloc-prefix=system' '--enable-checkpointing' '--with-hydra-ckpointlib=blcr' 'CPPFLAGS= -Wdate-time -D_FORTIFY_SOURCE=2 -I/build/mpich-O9at2o/mpich-3.3~a2/src/mpl/include -I/build/mpich-O9at2o/mpich-3.3~a2/src/mpl/include -I/build/mpich-O9at2o/mpich-3.3~a2/src/openpa/src -I/build/mpich-O9at2o/mpich-3.3~a2/src/openpa/src -D_REENTRANT -I/build/mpich-O9at2o/mpich-3.3~a2/src/mpi/romio/include' 'CFLAGS= -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -Wformat -Werror=format-security -O2' 'CXXFLAGS= -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -Wformat -Werror=format-security -O2' 'FFLAGS= -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -O2' 'FCFLAGS= -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -O2' 'build_alias=x86_64-linux-gnu' 'MPICHLIB_CFLAGS=-g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -Wformat -Werror=format-security' 'MPICHLIB_CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' 'MPICHLIB_CXXFLAGS=-g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -Wformat -Werror=format-security' 'MPICHLIB_FFLAGS=-g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong' 'MPICHLIB_FCFLAGS=-g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong' 'LDFLAGS=-Wl,-Bsymbolic-functions -Wl,-z,relro' 'FC=gfortran' 'F77=gfortran' 'MPILIBNAME=mpich' '--cache-file=/dev/null' '--srcdir=.' 'CC=gcc' 'LIBS=' 'MPLLIBNAME=mpl' Process Manager: pmi Launchers available: ssh rsh fork slurm ll lsf sge manual persist Topology libraries available: hwloc Resource management kernels available: user slurm ll lsf sge pbs cobalt Checkpointing libraries available: blcr Demux engines available: poll select

jiaduxie avatar Oct 28 '20 14:10 jiaduxie

I think the problem is that once the jobs start to run on a node the mpi library cannot be found. This is because the PATH and LD_LIBRARY_PATH are not exported. Could you try the following:

mpirun --prefix /home/work/anaconda3/envs/pynest_mpi/bin -np 2 -host work0,work1 python ./multi_test.py

jarsi avatar Oct 28 '20 14:10 jarsi

Hi, have you made progress?

I think the problems you are seeing are related to your mpi libraries. As the conda nest is compiled against openMPI, you must also use openMPI and not mpich. This means that mpirun should be the command you should use. But we are seeing that this does not work. My guess is, that once nest starts to run on the nodes it does not find the correct MPI library, gets confused and the nest instances run independently because they do not know how to use MPI. According to the OpenMPI FAQ you can try several things.

  1. Specify which mpi library to use via --prefix. I think in my previous message there might have been an error in the prefix.
  • mpirun --prefix /home/work/anaconda3/envs/pynest_mpi -np 2 -host work0,work1 python ./multi_test.py
  1. Specify which mpi library to use via using the complete openMPI path
  • /home/work/anaconda3/envs/pynest_mpi/bin/mpirun -np 2 -host work0,work1 python ./multi_test.py
  1. Add the following to ~/.profile
export PATH=/home/work/anaconda3/envs/pynest_mpi/bin:$PATH
export LD_LIBRARY_PATH=/home/work/anaconda3/envs/pynest_mpi/:$LD_LIBRARY_PATH

Does any of these approaches work or change the error message?

jarsi avatar Oct 30 '20 09:10 jarsi

I've tried it and it's still not good. Did you use conda to install nest or compile from source code?

jiaduxie avatar Oct 30 '20 12:10 jiaduxie

I tried it with both and both worked on my machine. What is the output of the different approaches posted above?

jarsi avatar Oct 30 '20 12:10 jarsi

It seems that I was running on work0, but work1 (Ubuntu16) terminated the job. Do you also use mpirun to run it? I am running under conda, and the compiled version of the source code is wrong. There are many kernel functions that are not defined and seem to be the old version.

(pynest_mpi) work@lyjteam-server:~/xjd/nest_multi_test$ /home/work/anaconda3/envs/pynest_mpi/bin/mpirun -np 2 -host work0,work1 python ./multi_test.py

[INFO] [2020.10.30 21:26:1 /home/conda/feedstock_root/build_artifacts/nest-simulator_1580129123254/work/nestkernel/rng_manager.cpp:217 @ Network::create_rngs_] : Creating default RNGs [INFO] [2020.10.30 21:26:1 /home/conda/feedstock_root/build_artifacts/nest-simulator_1580129123254/work/nestkernel/rng_manager.cpp:260 @ Network::create_grng_] : Creating new default global RNG [INFO] [2020.10.30 21:26:1 /home/conda/feedstock_root/build_artifacts/nest-simulator_1580129123254/work/nestkernel/rng_manager.cpp:217 @ Network::create_rngs_] : Creating default RNGs [INFO] [2020.10.30 21:26:1 /home/conda/feedstock_root/build_artifacts/nest-simulator_1580129123254/work/nestkernel/rng_manager.cpp:260 @ Network::create_grng_] : Creating new default global RNG python: /home/conda/feedstock_root/build_artifacts/nest-simulator_1580129123254/work/sli/scanner.cc:581: bool Scanner::operator()(Token&): Assertion `in->good()' failed. [ubuntu16:18103] *** Process received signal *** [ubuntu16:18103] Signal: Aborted (6) [ubuntu16:18103] Signal code: (-6) [ubuntu16:18103] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7fe0987a2890] [ubuntu16:18103] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7fe0983dde97] [ubuntu16:18103] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7fe0983df801] [ubuntu16:18103] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x3039a)[0x7fe0983cf39a] [ubuntu16:18103] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x30412)[0x7fe0983cf412] [ubuntu16:18103] [ 5] /home/work/anaconda3/envs/pynest_mpi/lib/python3.8/site-packages/nest/../../../libsli.so(_ZN7ScannerclER5Token+0x1489)[0x7fe08ad39eb9] [ubuntu16:18103] [ 6] /home/work/anaconda3/envs/pynest_mpi/lib/python3.8/site-packages/nest/../../../libsli.so(_ZN6ParserclER5Token+0x49)[0x7fe08ad2c229] [ubuntu16:18103] [ 7] /home/work/anaconda3/envs/pynest_mpi/lib/python3.8/site-packages/nest/../../../libsli.so(_ZNK14IparseFunction7executeEP14SLIInterpreter+0x96)[0x7fe08ad63666] [ubuntu16:18103] [ 8] /home/work/anaconda3/envs/pynest_mpi/lib/python3.8/site-packages/nest/../../../libsli.so(+0x74193)[0x7fe08ad22193] [ubuntu16:18103] [ 9] /home/work/anaconda3/envs/pynest_mpi/lib/python3.8/site-packages/nest/../../../libsli.so(_ZN14SLIInterpreter8execute_Em+0x222)[0x7fe08ad26a32] [ubuntu16:18103] [10] /home/work/anaconda3/envs/pynest_mpi/lib/python3.8/site-packages/nest/../../../libsli.so(_ZN14SLIInterpreter7startupEv+0x27)[0x7fe08ad26e57] [ubuntu16:18103] [11] /home/work/anaconda3/envs/pynest_mpi/lib/python3.8/site-packages/nest/../../../libnest.so(_Z11neststartupPiPPPcR14SLIInterpreterNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x1ea0)[0x7fe08b779a40] [ubuntu16:18103] [12] /home/work/anaconda3/envs/pynest_mpi/lib/python3.8/site-packages/nest/pynestkernel.so(+0x444dc)[0x7fe08bb764dc] [ubuntu16:18103] [13] python(+0x1b2f24)[0x56225ad72f24] [ubuntu16:18103] [14] python(_PyEval_EvalFrameDefault+0x4bd)[0x56225ad9c83d] [ubuntu16:18103] [15] python(_PyFunction_Vectorcall+0x1b7)[0x56225ad89197] [ubuntu16:18103] [16] python(_PyEval_EvalFrameDefault+0x71b)[0x56225ad9ca9b] [ubuntu16:18103] [17] python(_PyEval_EvalCodeWithName+0x260)[0x56225ad87ff0] [ubuntu16:18103] [18] python(+0x1f68ca)[0x56225adb68ca] [ubuntu16:18103] [19] python(+0x139ffd)[0x56225acf9ffd] [ubuntu16:18103] [20] python(PyVectorcall_Call+0x6e)[0x56225ad1ddee] [ubuntu16:18103] [21] python(_PyEval_EvalFrameDefault+0x60fd)[0x56225ada247d] [ubuntu16:18103] [22] python(_PyEval_EvalCodeWithName+0x260)[0x56225ad87ff0] [ubuntu16:18103] [23] python(_PyFunction_Vectorcall+0x594)[0x56225ad89574] [ubuntu16:18103] [24] python(_PyEval_EvalFrameDefault+0x4ea3)[0x56225ada1223] [ubuntu16:18103] [25] python(_PyFunction_Vectorcall+0x1b7)[0x56225ad89197] [ubuntu16:18103] [26] python(_PyEval_EvalFrameDefault+0x4bd)[0x56225ad9c83d] [ubuntu16:18103] [27] python(_PyFunction_Vectorcall+0x1b7)[0x56225ad89197] [ubuntu16:18103] [28] python(_PyEval_EvalFrameDefault+0x71b)[0x56225ad9ca9b] [ubuntu16:18103] [29] python(_PyFunction_Vectorcall+0x1b7)[0x56225ad89197] [ubuntu16:18103] *** End of error message ***

          -- N E S T --

Copyright (C) 2004 The NEST Initiative

Version: nest-2.18.0 Built: Jan 27 2020 12:49:17

This program is provided AS IS and comes with NO WARRANTY. See the file LICENSE for details.

Problems or suggestions? Visit https://www.nest-simulator.org

Type 'nest.help()' to find out more about NEST.

Oct 30 21:26:01 ModelManager::clear_models_ [Info]: Models will be cleared and parameters reset.

Oct 30 21:26:01 Network::create_rngs_ [Info]: Deleting existing random number generators

Oct 30 21:26:01 Network::create_rngs_ [Info]: Creating default RNGs

Oct 30 21:26:01 Network::create_grng_ [Info]: Creating new default global RNG

Oct 30 21:26:01 RecordingDevice::set_status [Info]: Data will be recorded to file and to memory. [lyjteam-server][[20644,1],0][btl_tcp.c:559:mca_btl_tcp_recv_blocking] recv(17) failed: Connection reset by peer (104)

Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted.


mpirun noticed that process rank 1 with PID 18103 on node work1 exited on signal 6 (Aborted).

jiaduxie avatar Oct 30 '20 13:10 jiaduxie