Chen Wang

Results 26 comments of Chen Wang

Just tried this on Corona. Didn't see any issues.

Adding another datapoint. On Frontier, 628 nodes, 8 ranks/node, bootstrap seems to TINEOUT consistently at `unifyfs_invoke_broadcast_bootstrap_complete()` call.

I happen to have this information since my current paper talks about MPI consistency model. The MPI standard provides three levels of consistency: 1. sequential consistency among all accesses using...

The apps themself rarely overwrite the same offsite (they rarely perform two collective calls on the same range). It is more likely the high-level libraries doing. E.g., HDF5 uses collective...

According to the pnetcdf document, "PnetCDF follows the same parallel I/O data consistency as MPI-IO standard". If this is the case, they should either set the atomic mode when opening...

@adammoody I'm trying to reproduce these conflicts. Which system and MPI implementation were you using?

I just tried ivarn and tst_def_var_fill using OpenMPI and mpich. They don't show any conflict on my side, all I/O calls are done internally using MPI_File_write_at_all (eventually only rank 0...

Yes, currently Pilgrim only includes a `pilgrim2text` tool for converting Pilgrim traces into .txt files. I agree It would be beneficial to have a dedicated library for parsing Pilgrim traces....

I have some internal reader code to read/decompress the Pilgrim traces. Please take a took at [pilgrim_reader.h](https://github.com/pmodels/pilgrim/blob/master/include/pilgrim_reader.h). The implementation is in [src/decoder](https://github.com/pmodels/pilgrim/tree/master/src/decoder). I don't currently have a good documentation on...

Did you see any output suggesting the simulation was still running? 15 secs vs 3 hours doesn't seem to be an overhead issue. More likely a deadlock/blocking bug in Pilgrim....