openPMD-api
openPMD-api copied to clipboard
[WIP] Parallel HDF5: 4MB Alignment & Buffer
FS blocksize:
stat -fc %s .
Tried those options on Cori (Scratch and CFS): 8_benchmark case with -w, KNL partition, WarpX-like MPI-rank placement.
modules: ... darshan/3.1.7 gcc/8.3.0 cray-mpich/7.7.10 cray-hdf5-parallel/1.10.5.2 ...
Scratch: 1MB recommended blocksize (confusingly, stat -fc %s <dir> reports 4KiB)
CFS: 16 MB blocksize (with 4MiB subblocks)
Support quote:
blocksize is a quirky parameter for parallel file systems because between your compute node and the actual block devices are a bunch of network and RAID layers that have their own magic sizes. Some arcane knowledge is required
Sets medium striping. Note: for proper ADIOS2 timings, keep the small default striping (it creates subfiles that should not be heavily striped); for proper HDF5 timings, enable striping (single output file that should be heavily striped).
For HDF5, we can also try T3PIO MPI_Info hints again.
Cori: Darshan Logs
# MPICH statistics collection
export MPICH_MPIIO_STATS=1
export MPICH_MPIIO_HINTS_DISPLAY=1
export MPICH_MPIIO_TIMERS=1
# Darshan extended trace (dxt) logs
export DARSHAN_DISABLE_SHARED_REDUCTION=1
export DXT_ENABLE_IO_TRACE=4
# work-around needed
export LD_PRELOAD=/global/common/cori_cle7/software/darshan/3.1.7/lib/libdarshan.so
// srun
# disable work-around
unset LD_PRELOAD
We can run these tests again after #916 was merged, maybe we see some improvement when setting striping with chunked data sets
Should also set H5Pset_alignment (and H5Pset_sieve_buf_size) here:
https://github.com/openPMD/openPMD-api/issues/578#issuecomment-865377511
Next measurements we should try on Cori (Suren Byna):
Option 1
- Set the alignment to 8 MB (in the
H5Pset_alignment()call, threshold of0and alignment of8MB) - Set striping on the directory where the data is being written.
- Stripe count: 40
- Stripe size: 8 MB
Just in case, here’s the command to set the stripe on a directory.
lfs setstripe --stripe_count 40 --stripe_size 8m ./benchmarks
Option 2
- Alignment of 16 MB
- Stripe count : 40
- Stripe size: 16 MB
Jan/Feb tests
We tried various sizes in Jan/February with the job script linked above in the PR description. We saw no improvement on Cori at the time.
Since then, we implementing chunking #406 and changed the benchmark from then from 4D to 3D: #1010 Also, we have new parallel benchmarks now (8a, 8b).