SWIFT icon indicating copy to clipboard operation
SWIFT copied to clipboard

Gasoline and Anarchy-PU crashing with additional physics

Open FHusko opened this issue 3 years ago • 7 comments

Hi SWIFT team,

I have been attempting some simulations using the Gasoline and Anarchy-PU hydro schemes. The setup is a spherically symmetric gas halo initially in hydrostatic equilibrium, using an external NFW potential. I have tested the setup with SPHENIX very well at this point across different resolution levels, up to 3 Gyr. Gasoline/Anarchy-PU both crash at around 100 Myr. I don't remember the exact error I got with Anarchy-PU, but this is what I get with gasoline:

[m7124:86540:0:86720] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x2af38c120440)
[m7127:123101:0:123272] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x2b31b42e27a0)
==== backtrace (tid:  86720) ====
0 0x00000000004ea4d2 space_parts_get_cell_index_mapper()  ???:0
1 0x000000000049b618 threadpool_runner()  threadpool.c:0
2 0x0000000000007ea5 start_thread()  pthread_create.c:0
3 0x00000000000fe9fd __clone()  ???:0
=================================
==== backtrace (tid: 123272) ====
0 0x00000000004ea4d2 space_parts_get_cell_index_mapper()  ???:0
1 0x000000000049b618 threadpool_runner()  threadpool.c:0
2 0x0000000000007ea5 start_thread()  pthread_create.c:0
3 0x00000000000fe9fd __clone()  ???:0
=================================
[m7125:70337:0:70525] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x2ac87ce8cb90)
==== backtrace (tid:  70525) ====
0 0x00000000004ea4d2 space_parts_get_cell_index_mapper()  ???:0
1 0x000000000049b618 threadpool_runner()  threadpool.c:0
2 0x0000000000007ea5 start_thread()  pthread_create.c:0
3 0x00000000000fe9fd __clone()  ???:0 

The run command is

mpirun -np 8 /cosma/home/durham/dc-husk1/SWIFT_SPH/swiftsim/examples/swift_mpi --external-gravity --self-gravity --hydro --temperature --threads=14 --limiter --sync --pin params.yml

This is with 4 nodes of cosma7, I have attempted both non-MPI and cosma6, I get errors regardless. The configure option is

--with-cooling=COLIBRE --with-chemistry=EAGLE --enable-boundary-particles=10000000 --with-hydro=gasoline --with-gravity=with-multi-softening --with-tracers=EAGLE --with-ext-potential=nfw

The paramater file contains the following:

metaData:
  run_name:   IsolatedGalaxy-EAGLE-Ref

# Define the system of units to use internally.
InternalUnitSystem:
  UnitMass_in_cgs:     1.98848e43    # 10^10 M_sun in grams
  UnitLength_in_cgs:   3.08566e21 # 1 kpc in cm
  UnitVelocity_in_cgs: 1e5           # 1 km/s in cm/s
  UnitCurrent_in_cgs:  1             # Amperes
  UnitTemp_in_cgs:     1             # Kelvin

# Parameters for the self-gravity scheme
Gravity:
  eta:          0.025                 # Constant dimensionless multiplier for time integration.
  MAC:          geometric
  theta_cr:     0.7                   # Opening angle (Multipole acceptance criterion).
  use_tree_below_softening:  0
  max_physical_DM_softening:     0.3 # Physical softening length (in internal units).
  max_physical_baryon_softening: 0.3 # Physical softening length (in internal units).
  mesh_side_length:              256

# Parameters governing the time integration (Set dt_min and dt_max to the same value for a fixed time-step run.)
TimeIntegration:
  time_begin:        0.    # The starting time of the simulation (in internal units).
  time_end:          2   # The end time of the simulation (in internal units).
  dt_min:            1e-14  # The minimal time-step size of the simulation (in internal units).
  dt_max:            1e-2  # The maximal time-step size of the simulation (in internal units).

# Parameters governing the snapshots
Snapshots:
  basename:              output      # Common part of the name of output files
  time_first:            0.          # Time of the first output if non-cosmological time-integration (in internal units)
  delta_time:            0.0125       # Time difference between consecutive outputs (in internal units)
  compression:           7           # Compress the snapshots
  select_output_on:      1
  select_output:         param_list.yml
  output_list_on:        1
  output_list:           output_list.txt

Restarts:
  delta_hours:           1

Scheduler:
  max_top_level_cells:   20

# Parameters governing the conserved quantities statistics
Statistics:
  delta_time:           1e-1     # Time between statistics output
  time_first:              0     # (Optional) Time of the first stats output if non-cosmological time-integration (in internal units)

# Parameters related to the initial conditions
InitialConditions:
  file_name:               ICs.hdf5 # The file to read
  periodic:                0            # Are we running with periodic ICs?
#  stars_smoothing_length:  0.6

# Parameters for the hydrodynamics scheme
SPH:
  resolution_eta:        1.2348   # Target smoothing length in units of the mean inter-particle separation (1.2348 == 48Ngbs with the cubic spline kernel).
  CFL_condition:         0.2      # Courant-Friedrich-Levy condition for time integration.
  h_min_ratio:           0.1      # Minimal smoothing in units of softening.
  h_max:                 10.
  minimal_temperature:   100.

# Standard EAGLE cooling options
EAGLECooling:
  dir_name:                /cosma6/data/dp004/dc-husk1/SWIFT/IsolatedGalaxy/IsolatedGalaxy_feedback/coolingtables/  # Location of the Wiersma+09 cooling tables
  H_reion_z:               7.5               # Redshift of Hydrogen re-ionization
  H_reion_eV_p_H:          2.0               # Energy inject by Hydrogen re-ionization in electron-volt per Hydrogen atom
  He_reion_z_centre:       3.5               # Redshift of the centre of the Helium re-ionization Gaussian
  He_reion_z_sigma:        0.5               # Spread in redshift of the  Helium re-ionization Gaussian
  He_reion_eV_p_H:         2.0               # Energy inject by Helium re-ionization in electron-volt per Hydrogen atom

# COLIBRE cooling parameters
COLIBRECooling:
  dir_name:                /cosma6/data/dp004/dc-husk1/SWIFT/IsolatedGalaxy/IsolatedGalaxy_feedback/UV_dust1_CR1_G1_shield1.hdf5 # Location of the Ploeckinger+20 cooling tables
  H_reion_z:               7.5               # Redshift of Hydrogen re-ionization (Planck 2018)
  H_reion_eV_p_H:          2.0
  He_reion_z_centre:       3.5               # Redshift of the centre of the Helium re-ionization Gaussian
  He_reion_z_sigma:        0.5               # Spread in redshift of the  Helium re-ionization Gaussian
  He_reion_eV_p_H:         2.0               # Energy inject by Helium re-ionization in electron-volt per Hydrogen atom
  delta_logTEOS_subgrid_properties: 0.3      # delta log T above the EOS below which the subgrid properties use Teq assumption
  rapid_cooling_threshold:          0.333333 # Switch to rapid cooling regime for dt / t_cool above this threshold.

# Use solar abundances
EAGLEChemistry:
  init_abundance_metal:     0.0129
  init_abundance_Hydrogen:  0.7065
  init_abundance_Helium:    0.2806
  init_abundance_Carbon:    0.00207
  init_abundance_Nitrogen:  0.000836
  init_abundance_Oxygen:    0.00549
  init_abundance_Neon:      0.00141
  init_abundance_Magnesium: 0.000591
  init_abundance_Silicon:   0.000683
  init_abundance_Iron:      0.0011

# NFW potential parameters
NFWPotential:
  useabspos:          0             # 0 -> positions based on centre, 1 -> absolute positions
  position:           [0.0,0.0,0.0] # Location of centre of the NFW potential with respect to centre of the box (internal units) if useabspos=0 otherwise with respect to the 0,0,0, coordinates.
  concentration:      5.6            # Concentration of the halo
  M_200:              10000.         # Mass of the halo (M_200 in internal units)
  critical_density:   1.36e-8       # Critical density (internal units).
  timestep_mult:      0.025          # Dimensionless pre-factor for the time-step condition, basically determines fraction of orbital time we need to do an integration step
  epsilon:            0.3
  h:                  0.7

I have tried using debug, debugging checks and sanitizer, these didn't yield any additional error-related info that I could see. I am running with these again and will share the new code outputs if you think that will help.

Thanks for the help in advance!

Edit: Here's the output with debugging turned on.

[m7031:259876:0:260037] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x2b14b40008c0)
[m7028:86884:0:87046] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x2ab4d45beb50)
[m7029:265468:0:265468] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xfffffffe07982770)

/cosma/home/durham/dc-husk1/SWIFT_SPH/swiftsim/src/space_cell_index.c: [ space_parts_get_cell_index_mapper() ]
      ...
      145       /* Is this a place-holder for on-the-fly creation? */
      146       ind[k] = index;
      147       cell_counts[index]++;
==>   148       ++count_extra_part;
      149
      150     } else {
      151       /* Normal case: list its top-level cell index */

==== backtrace (tid:  87046) ====
 0 0x00000000004ea5fc space_parts_get_cell_index_mapper()  /cosma/home/durham/dc-husk1/SWIFT_SPH/swiftsim/src/space_cell_index.c:148
 1 0x000000000049b24a threadpool_chomp()  /cosma/home/durham/dc-husk1/SWIFT_SPH/swiftsim/src/threadpool.c:164
 2 0x000000000049b24a threadpool_runner()  /cosma/home/durham/dc-husk1/SWIFT_SPH/swiftsim/src/threadpool.c:191
 3 0x0000000000007ea5 start_thread()  pthread_create.c:0
 4 0x00000000000fe9fd __clone()  ???:0
=================================

FHusko avatar Nov 29 '21 16:11 FHusko