openPMD-api icon indicating copy to clipboard operation
openPMD-api copied to clipboard

[WIP] Parallel HDF5: 4MB Alignment & Buffer

Open ax3l opened this issue 4 years ago • 3 comments

FS blocksize:

stat -fc %s .

Tried those options on Cori (Scratch and CFS): 8_benchmark case with -w, KNL partition, WarpX-like MPI-rank placement. modules: ... darshan/3.1.7 gcc/8.3.0 cray-mpich/7.7.10 cray-hdf5-parallel/1.10.5.2 ...

Scratch: 1MB recommended blocksize (confusingly, stat -fc %s <dir> reports 4KiB) CFS: 16 MB blocksize (with 4MiB subblocks)

Support quote:

blocksize is a quirky parameter for parallel file systems because between your compute node and the actual block devices are a bunch of network and RAID layers that have their own magic sizes. Some arcane knowledge is required

Sets medium striping. Note: for proper ADIOS2 timings, keep the small default striping (it creates subfiles that should not be heavily striped); for proper HDF5 timings, enable striping (single output file that should be heavily striped).

For HDF5, we can also try T3PIO MPI_Info hints again.

cori.sbatch.txt

Cori: Darshan Logs

# MPICH statistics collection
export MPICH_MPIIO_STATS=1
export MPICH_MPIIO_HINTS_DISPLAY=1
export MPICH_MPIIO_TIMERS=1

# Darshan extended trace (dxt) logs
export DARSHAN_DISABLE_SHARED_REDUCTION=1
export DXT_ENABLE_IO_TRACE=4

# work-around needed
export LD_PRELOAD=/global/common/cori_cle7/software/darshan/3.1.7/lib/libdarshan.so

// srun

# disable work-around
unset LD_PRELOAD

ax3l avatar Jan 13 '21 07:01 ax3l

We can run these tests again after #916 was merged, maybe we see some improvement when setting striping with chunked data sets

ax3l avatar Jun 11 '21 19:06 ax3l

Should also set H5Pset_alignment (and H5Pset_sieve_buf_size) here: https://github.com/openPMD/openPMD-api/issues/578#issuecomment-865377511

ax3l avatar Jun 24 '21 02:06 ax3l

Next measurements we should try on Cori (Suren Byna):

Option 1

  • Set the alignment to 8 MB (in the H5Pset_alignment() call, threshold of 0 and alignment of 8MB)
  • Set striping on the directory where the data is being written.
  • Stripe count: 40
  • Stripe size: 8 MB

Just in case, here’s the command to set the stripe on a directory.

lfs setstripe --stripe_count 40 --stripe_size 8m ./benchmarks

Option 2

  • Alignment of 16 MB
  • Stripe count : 40
  • Stripe size: 16 MB

Jan/Feb tests

We tried various sizes in Jan/February with the job script linked above in the PR description. We saw no improvement on Cori at the time.

Since then, we implementing chunking #406 and changed the benchmark from then from 4D to 3D: #1010 Also, we have new parallel benchmarks now (8a, 8b).

ax3l avatar Jun 24 '21 07:06 ax3l