ADIOS2
ADIOS2 copied to clipboard
discussion: file system topologies
Hi @williamfgc and @pnorbert
This is not really an issue - more of a stackoverflow type of question - but figured that either of you might have addressed or thought about it. I did a quick scour of the ADIOS and MPI-IO code bases, but nothing jumped out at me.
I'm looking for a reasonable and efficient means of determining what the filesystem topology looks like for a given simulation. By topology I just mean trying to determine which nodes/processes are associated with which shared portions of a filesystem.
Eg,
rank | host | path |
---|---|---|
0 | host0 | /my/path/case |
1 | host0 | /my/path/case |
8 | host1 | /my/path/case |
... | ... | ... |
16 | host2 | /mnt/other/path/case |
... | ... | ... |
I would now like to determine which of the paths (/my/path/case
, /mnt/other/path/case
, ...) actually point to the same shared (eg, NFS) filesystem. For me the obvious brute force method would be to have every unique host touch a file into existence (eg, touch filesystem-probe.$(hostname)
) maybe sleep for a little and then use a directory listing of what got created. After than can work out which host have what in common with each other.
This is the general idea, but also pretty ugly - I can't wait for the sysadmin to complain about 10k "touches" occurring every simulation startup (immediately followed by the dir listing!). It seems there must be a better way, but know idea what it would be. I'm not sure if there is anything within ADIOS that has something similar for make good guesses at aggregation, for example.
Any ideas would be helpful.
Cheers,
/mark
If you're on linux, this might help:
[kaig1@login5 ~]$ findmnt --target $PWD
TARGET SOURCE FSTYPE OPTIONS
/autofs/nccs-svm1_home1 172.30.252.196:/nccs/home1 nfs rw,nosuid,relatime,vers=3,rsize=32768,wsize=32768
(Note that findmnt has options to make it easier to use in scripts, e.g.,
[kaig1@login5 ~]$ findmnt -o SOURCE -n --target $PWD
172.30.252.196:/nccs/home1
Thanks @germasch
I guess that I should have mentioned that I probably want to avoid assumptions about using Linux (and need to put this into C/C++), but could perhaps bury some findmnt equivalent into a Linux-specific optimization.
@olesenm I'm not familiar with NFS, in the past there were Lustre options for getting stripping and OST mapping on Titan using the vendor API, If I recall correctly @pnorbert and @jychoi-hpc would be more familiar with understanding the topology (it's been a while). For Summit we don't have such fine-tune options for GPFS as stated here. I don't think there is anything "portable" on C++ (even in C++17 std::filesystem
, adios2 is still C++11), perhaps POSIX stat
deviceID, has the info? Aggregation in adios2 just follows the application rank ordering for grouping assuming proximity. Hope it helps.
@olesenm We never encountered a system where the same shared file system is on different local paths, and never tried to solve this problem. If this case is not solvable as an administration issue, there should be some config files describing the topology of such an ad-hoc distributed setup. 10k touches at each startup may convince the admins to provide a usable configuration file.
Well, I don't even understand the scenario. Which one is it?
- there is a shared file system but they have a different path on each host - I commented on this scenario.
- there are multiple shared file systems, each has a specific path to it, but nodes have access to only one of them, and you want to group the nodes that have access to the same file system?
Hi @pnorbert - missed the first notice, then holidays etc...
In the meantime, I think that the only thing we really need to worry about is if the paths are identical on all machines we'd want to know if they are indeed all on NFS or local disks. I think the suggestion from @germasch will suffice (at least for Linux).