tutorials icon indicating copy to clipboard operation
tutorials copied to clipboard

Error running perpendicular-flap/fluid-openfoam in parallel

Open efirvida opened this issue 3 years ago • 5 comments

Hi, I'm trying to run this tutorial in parallel just using `run.sh -parallel but always fail. I tried several configurations of decomposeParDict until I found that it only runs in parallel if I use this setting....

numberOfSubdomains 2;

method          simple;

simpleCoeffs
{
    n               (2 1 1);
    delta           0.001;
}

I'm running the preCICE adapters built with EasyBuild and the easyconfigs that I have made see it here: https://github.com/efirvida/easybuild-easyconfigs/commit/62611dc79313063019bce90ba83f42081c1fd998, So I'm really don't know if I have a mistake in my easyconfigs or is a tutorial error.

I have plans to submit the easyconfigs to the main EasyBuild repo but to do it I have to be sure that they work, and then follow my research on FSI.

Another thing that may be important to say is that I'm using Fedora 34 and I have some problems building the foss-2020a toolchain due to Binutils 2.34 bug (https://bugzilla.redhat.com/show_bug.cgi?id=1916925) and I change the version of the Binutils to 2.36.1 and Bison to 3.7.6 to the whole toolchain, and that's the main reason of my branch here https://github.com/efirvida/easybuild-easyconfigs/tree/fsi, I don't know if this introduces some bugs to the library.

  • Tutorials state (last commit / release): 45eedc6281bf08958fcefe59756ed58961dcb147
  • Versions of solvers and adapters used: OpenFOAM v2012 with OpenFOAM-adapter-1.0.0
  • preCICE version: 2.2.0 and 2.2.1
  • Log files:
/*---------------------------------------------------------------------------*\
| =========                 |                                                 |
| \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |
|  \\    /   O peration     | Version:  v2012                                 |
|   \\  /    A nd           | Website:  www.openfoam.com                      |
|    \\/     M anipulation  |                                                 |
\*---------------------------------------------------------------------------*/
Build  : _7bdb509494-20201222 OPENFOAM=2012
Arch   : "LSB;label=32;scalar=64"
Exec   : blockMesh
Date   : Jun 14 2021
Time   : 19:12:20
Host   : Naboo
PID    : 2943179
I/O    : uncollated
Case   : /home/efirvida/Desktop/dev/PHD/tutorials/perpendicular-flap/fluid-openfoam
nProcs : 1
trapFpe: Floating point exception trapping enabled (FOAM_SIGFPE).
fileModificationChecking : Monitoring run-time modified files using timeStampMaster (fileModificationSkew 5, maxFileModificationPolls 20)
allowSystemOperations : Allowing user-supplied system call operations

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

Creating block mesh from "system/blockMeshDict"
Creating block edges
No non-planar block faces defined
Creating topology blocks
Creating topology patches

Creating block mesh topology

Check topology

	Basic statistics
		Number of internal faces : 4
		Number of boundary faces : 22
		Number of defined boundary faces : 22
		Number of undefined boundary faces : 0
	Checking patch -> block consistency

Creating block offsets
Creating merge list (topological search)...
Deleting polyMesh directory "constant/polyMesh"

Creating polyMesh from blockMesh
Creating patches
Creating cells
Creating points with scale 1
    Block 0 cell size :
        i : 0.136132 .. 0.0680662
        j : 0.0666667 .. 0.0666667
        k : 1 .. 1

    Block 1 cell size :
        i : 0.0680662 .. 0.136132
        j : 0.0666667 .. 0.0666667
        k : 1 .. 1

    Block 2 cell size :
        i : 0.136132 .. 0.0680662
        j : 0.0692199 .. 0.13844
        k : 1 .. 1

    Block 3 cell size :
        i : 0.0333333 .. 0.0333333
        j : 0.0692199 .. 0.13844
        k : 1 .. 1

    Block 4 cell size :
        i : 0.0680662 .. 0.136132
        j : 0.0692199 .. 0.13844
        k : 1 .. 1


There are no merge patch pairs

Writing polyMesh with 0 cellZones
----------------
Mesh Information
----------------
  boundingBox: (-3 0 0) (3 4 1)
  nPoints: 5828
  nCells: 2790
  nFaces: 11283
  nInternalFaces: 5457
----------------
Patches
----------------
  patch 0 (start: 5457 size: 45) name: inlet
  patch 1 (start: 5502 size: 45) name: outlet
  patch 2 (start: 5547 size: 33) name: flap
  patch 3 (start: 5580 size: 63) name: upperWall
  patch 4 (start: 5643 size: 60) name: lowerWall
  patch 5 (start: 5703 size: 5580) name: frontAndBack

End

/*---------------------------------------------------------------------------*\
| =========                 |                                                 |
| \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |
|  \\    /   O peration     | Version:  v2012                                 |
|   \\  /    A nd           | Website:  www.openfoam.com                      |
|    \\/     M anipulation  |                                                 |
\*---------------------------------------------------------------------------*/
Build  : _7bdb509494-20201222 OPENFOAM=2012
Arch   : "LSB;label=32;scalar=64"
Exec   : decomposePar -force
Date   : Jun 14 2021
Time   : 19:12:20
Host   : Naboo
PID    : 2943189
I/O    : uncollated
Case   : /home/efirvida/Desktop/dev/PHD/tutorials/perpendicular-flap/fluid-openfoam
nProcs : 1
trapFpe: Floating point exception trapping enabled (FOAM_SIGFPE).
fileModificationChecking : Monitoring run-time modified files using timeStampMaster (fileModificationSkew 5, maxFileModificationPolls 20)
allowSystemOperations : Allowing user-supplied system call operations

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time



Decomposing mesh region0

Removing 2 existing processor directories
Create mesh

Calculating distribution of cells
Selecting decompositionMethod simple [4]

Finished decomposition in 0.01 s

Calculating original mesh data

Distributing cells to processors

Distributing faces to processors

Distributing points to processors

Constructing processor meshes

Processor 0
    Number of cells = 698
    Number of faces shared with processor 1 = 8
    Number of faces shared with processor 2 = 31
    Number of processor patches = 2
    Number of processor faces = 39
    Number of boundary faces = 1465

Processor 1
    Number of cells = 697
    Number of faces shared with processor 0 = 8
    Number of faces shared with processor 3 = 33
    Number of processor patches = 2
    Number of processor faces = 41
    Number of boundary faces = 1463

Processor 2
    Number of cells = 697
    Number of faces shared with processor 0 = 31
    Number of faces shared with processor 3 = 23
    Number of processor patches = 2
    Number of processor faces = 54
    Number of boundary faces = 1448

Processor 3
    Number of cells = 698
    Number of faces shared with processor 1 = 33
    Number of faces shared with processor 2 = 23
    Number of processor patches = 2
    Number of processor faces = 56
    Number of boundary faces = 1450

Number of processor faces = 95
Max number of cells = 698 (0.0716846% above average 697.5)
Max number of processor patches = 2 (0% above average 2)
Max number of faces between processors = 56 (17.8947% above average 47.5)

Time = 0

Processor 0: field transfer
Processor 1: field transfer
Processor 2: field transfer
Processor 3: field transfer

End

/*---------------------------------------------------------------------------*\
| =========                 |                                                 |
| \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |
|  \\    /   O peration     | Version:  v2012                                 |
|   \\  /    A nd           | Website:  www.openfoam.com                      |
|    \\/     M anipulation  |                                                 |
\*---------------------------------------------------------------------------*/
Build  : _7bdb509494-20201222 OPENFOAM=2012
Arch   : "LSB;label=32;scalar=64"
Exec   : pimpleFoam -parallel
Date   : Jun 14 2021
Time   : 19:12:23
Host   : Naboo
PID    : 2943207
I/O    : uncollated
Case   : /home/efirvida/Desktop/dev/PHD/tutorials/perpendicular-flap/fluid-openfoam
nProcs : 4
Hosts  :
(
    (Naboo 4)
)
Pstream initialized with:
    floatTransfer      : 0
    nProcsSimpleSum    : 0
    commsType          : nonBlocking
    polling iterations : 0
trapFpe: Floating point exception trapping enabled (FOAM_SIGFPE).
fileModificationChecking : Monitoring run-time modified files using timeStampMaster (fileModificationSkew 5, maxFileModificationPolls 20)
allowSystemOperations : Allowing user-supplied system call operations

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

Create mesh for time = 0

Selecting dynamicFvMesh dynamicMotionSolverFvMesh
Selecting motion solver: displacementLaplacian
Applying solid body motion to entire mesh
Selecting motion diffusion: quadratic
Selecting motion diffusion: inverseDistance
Selecting patchDistMethod meshWave

PIMPLE: Operating solver in PISO mode

Reading field p

Reading field U

Reading/calculating face flux field phi

Selecting incompressible transport model Newtonian
Selecting turbulence model type laminar
Selecting laminar stress model Stokes
No MRF models present

No finite volume options present
Constructing face velocity Uf

Courant Number mean: 0 max: 0

Starting time loop

---[preciceAdapter] Loaded the OpenFOAM-preCICE adapter v1.0.0.
---[preciceAdapter] Reading preciceDict...
---[precice] [0m This is preCICE version 2.2.1
---[precice] [0m Revision info: no-info [Git failed/Not a repository]
---[precice] [0m Configuration: Release (Debug and Trace log unavailable)
---[precice] [0m Configuring preCICE with configuration "../precice-config.xml"
---[precice] [0m I am participant "Fluid"
---[precice] [0m Connecting Master to 3 Slaves
[2]PETSC ERROR: ------------------------------------------------------------------------
[2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[2]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[2]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run 
[2]PETSC ERROR: to get more information on the crash.
[2]PETSC ERROR: User provided function() line 0 in  unknown file  
[3]PETSC ERROR: ------------------------------------------------------------------------
[3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[3]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[3]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run 
[3]PETSC ERROR: to get more information on the crash.
[3]PETSC ERROR: User provided function() line 0 in  unknown file  
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
with errorcode 59.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
---[precice] [0m Setting up master communication to coupling partner/s
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run 
[0]PETSC ERROR: to get more information on the crash.
[0]PETSC ERROR: User provided function() line 0 in  unknown file  
[1]PETSC ERROR: ------------------------------------------------------------------------
[1]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end
[1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[1]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[1]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run 
[1]PETSC ERROR: to get more information on the crash.
[1]PETSC ERROR: User provided function() line 0 in  unknown file  
[Naboo:2943192] 3 more processes have sent help message help-mpi-api.txt / mpi-abort
[Naboo:2943192] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

efirvida avatar Jun 14 '21 23:06 efirvida

I can reproduce this with OpenFOAM v2012 (installed from .deb) on Ubuntu 21.04, with preCICE v2.2.1 (built from source). My system has only two physical cores, and I use export OMPI_MCA_rmaps_base_oversubscribe=1 in my ~/.bashrc.

It does not seem to matter if the interface is "cut" by the parallel boundary:

  • in 2x1 it is cut and it works,
  • in 3x1 it is not cut and it fails
  • in 4x1 it is cut and it fails
  • in 2x2 it is cut and it fails

Since people have used the OpenFOAM adapter with more ranks and since we have also ran e.g. the turek-hron-fsi3 case with 25 ranks, this should be specific to the tutorial or the system.

@efirvida how many physical & logical cores do you have on your system?

MakisH avatar Jun 15 '21 10:06 MakisH

@MakisH I'm running on a laptop with a i7-8650U, so I have 4 cores with 2 threads each, I test the old version of the tutorial rolling back the repository to the commit 5f4031fc7e45807dca787a525569b39a1909d2a3, and it works fine. I use the -oversubscribe too and testit up to 12 partitions, I haven't time to compare the tutorials to see what's different, and also I haven't much experience with preCICE yet, but the old version didn't fail on any of my tests.

efirvida avatar Jun 15 '21 13:06 efirvida

I think the crucial factor here is whether the master rank of OpenFOAM owns interface nodes or not. IIRC I had already a similar issue in the past. I'm still a bit puzzled whether the issue is triggered from the OpenFOAM side or from the preCICE side. I have some cases to test.. a workaround should still be given by this approach .

davidscn avatar Jun 16 '21 15:06 davidscn

I think it is an issue in the adapter rather than preCICE. Some corner cases with empty master ranks were fixed in the preCICE bugfix release v2.2.1. and IIRC I already ran empty master cases with other solver. I need to build the adapter in debug mode (CXX_FLAG='-g') to get more information here:

[2] #3  ? at Interface.C:?
[0] #4  preciceAdapter::Interface::Interface(precice::SolverInterface&, Foam::fvMesh const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, bool) at Interface.C:?
[2] #4  preciceAdapter::Interface::Interface(precice::SolverInterface&, Foam::fvMesh const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, bool) at ??:?
[0] #5  preciceAdapter::Adapter::configure() at ??:?
[2] #5  preciceAdapter::Adapter::configure() at ??:?
[0] #6  Foam::functionObjects::preciceAdapterFunctionObject::read(Foam::dictionary const&) at ??:?
[2] #6  Foam::functionObjects::preciceAdapterFunctionObject::read(Foam::dictionary const&) at ??:?
[0] #7  Foam::functionObjects::preciceAdapterFunctionObject::preciceAdapterFunctionObject(Foam::word const&, Foam::Time const&, Foam::dictionary const&) at ??:?
[2] #7  Foam::functionObjects::preciceAdapterFunctionObject::preciceAdapterFunctionObject(Foam::word const&, Foam::Time const&, Foam::dictionary const&) at ??:?
[0] #8  Foam::functionObject::adddictionaryConstructorToTable<Foam::functionObjects::preciceAdapterFunctionObject>::New(Foam::word const&, Foam::Time const&, Foam::dictionary const&) at ??:?
[2] #8  Foam::functionObject::adddictionaryConstructorToTable<Foam::functionObjects::preciceAdapterFunctionObject>::New(Foam::word const&, Foam::Time const&, Foam::dictionary const&) at ??:?
[0] #9  Foam::functionObject::New(Foam::word const&, Foam::Time const&, Foam::dictionary const&) at ??:?
[2] #9  Foam::functionObject::New(Foam::word const&, Foam::Time const&, Foam::dictionary const&) at ??:?
[0] #10  Foam::functionObjectList::read() at ??:?
[2] #10  Foam::functionObjectList::read() at ??:?
[0] #11  Foam::Time::run() const at ??:?
[2] #11  Foam::Time::run() const at ??:?

davidscn avatar Jun 16 '21 16:06 davidscn

I can confirm that I can successfully run test cases (no OpenFOAM) where the master rank is not located at the interface.

davidscn avatar Jun 18 '21 12:06 davidscn