Blake Devcich
Blake Devcich
It appears that 4.1.0 is the latest that is going to be provided with bullseye: https://packages.debian.org/bullseye/openmpi-bin. I didn't see any updated packages in the updates. It looks like bookworm has...
Thanks. Makes sense. It adds some complexity. I think 4.1.4 (that comes with bookworm) will be fine as it contains the fix that we're interested in. So the path forward...
Any updates on this? Appreciate the work you do!
I created PR for this: https://github.com/hpc/mpifileutils/pull/540.
Any traction on this?
Here's another case with `dcp -d dbg` enabled. According to the debug output, `mkdir` appears to have worked. ``` [2024-05-29T17:58:39] Walking /lus/global/blake/dm-system-test/src [2024-05-29T17:58:39] Walked 9 items in 0.013 secs (706.261...
I was able to add a pause in our code after the dcp failure occurs to go take a look at the lustre filesystem on each of the nodes. Obviously...
> @bdevcich can you modify your build of mpifileutils so `dcp` does a stat() and reports ownership + permissions on those directories after the mkdir returns? I'm back from vacation...
Here is the output with some added debugging lines from the `stat()`: ``` [2024-06-05T15:00:57] Walking /lus/global/blake/dm-system-test/src [2024-06-05T15:00:57] Walked 9 items in 0.013 secs (702.758 items/sec) ... [2024-06-05T15:00:57] Walked 9 items...
> What version of Lustre is in use on the servers? There was a race in the server request processing code that resulted in their-created files/directories having permission 0000, and...