SU2 icon indicating copy to clipboard operation
SU2 copied to clipboard

for certain nr of nodes and meshes, mpirun crashes when it cannot find the inlet profile file

Open bigfooted opened this issue 2 years ago • 3 comments

Describe the bug If I start SU2_CFD with USE_INLET_PROFILE= YES but the inlet profile does not exist, then an example_profile.dat file will be written for you. When I start in parallel with mpirun, it crashes for certain choices of nr of nodes. For me, it crashes when I choose n=4,6,7,9,10,12 I have no idea where to look for a solution...

[EDIT] So this actually only happens for specific meshes, I could only reproduce it on my turbulent 90 degree pipe bend until now. [EDIT] a mesh where this happens can be found here (90 degree pipe bend) https://github.com/bigfooted/su2cases/tree/master/validation/sudo_pipe_bend If you set SPECIFIED_INLET_PROFILE=YES, then for me, it crashes with mpirun and e.g. n=4.

bigfooted avatar Sep 22 '22 21:09 bigfooted

Can you provide a way to reproduce the problem as the bug report template says? Or do we convert this to a discussion?

What is the common denominator when it crashes? Does the marker appear only on one rank? Or does it appear on more than one? Or is there no pattern related to partitioning? Does the code work when the file exist? i.e. is this related to the reading in general or exclusively with writing the example? If it is related to writing, what happens if you comment out the calls to MPI (to gather coordinates etc.)?

pcarruscag avatar Sep 22 '22 22:09 pcarruscag

Thank you @pcarruscag. I have placed a link to a testcase. I have updated the title, because it looks like it is not a general problem. By the way, can I visualize the partitions in paraview?

bigfooted avatar Sep 23 '22 12:09 bigfooted

We have a rank output I dont remember the group but dry run should know

pcarruscag avatar Sep 23 '22 12:09 pcarruscag

Does the code work when the file exist? i.e. is this related to the reading in general or exclusively with writing the example?

Yes, it is only when writing the example_profile, so it doesn't leave CSolver::LoadInletProfile ,although it does reach the end of the routine.

bigfooted avatar Oct 02 '22 16:10 bigfooted

It is possible that while the master rank is writing the template file and throwing the error, another rank starts trying to access the profile data (which was not read) and then segfaults. Try putting a barrier so that ranks don't escape while the template is being written.

if (profile_file.fail()) {
    MergeProfileMarkers();
    WriteMarkerProfileTemplate();
    // barrier here
  } else {
    ReadMarkerProfile();
  }

pcarruscag avatar Oct 02 '22 17:10 pcarruscag

OK, thanks, SU2_MPI::Barrier definitely helps in narrowing down where the problem is. In MergeProfileMarkers, we get the number of profiles:

for (iMarker = 0; iMarker < config->GetnMarker_All(); iMarker++) {
   if (config->GetMarker_All_KindBC(iMarker) == markerType) {
      numberOfProfiles++;

Then downstream, we do:

  if (rank == MASTER_NODE) {
...
    profileCoords.resize(numberOfProfiles);

And when it fails, it is because for MASTER, we did not have the condition that config->GetMarker_All_KindBC(iMarker) == markerType ,although this condition is true for another rank.

bigfooted avatar Oct 03 '22 07:10 bigfooted

completed by PR

bigfooted avatar Oct 12 '22 14:10 bigfooted