UnifyFS
UnifyFS copied to clipboard
Excluding mountpoint for generating gfid
For generating global file identifier (gfid), unifyfs uses a full pathname including the mountpoint. This work fine when the file system is mounted under a consistent mountpoint (e.g., /unifyfs) across sessions. However, if a user uses different mountpoints across sessions, this becomes a problem:
[root@ninja70 testbed]$ mpirun -hostfile nodelist.txt -n 4 prefix/libexec/sysio-write-gotcha -m /unifyfs
[root@ninja70 testbed]$ mpirun -hostfile nodelist.txt -n 4 prefix/libexec/sysio-read-gotcha -m /unifyfs
Number of processes: 4
Each process wrote: 128.000000 MB
Total reads: 512.000000 MB
I/O pattern: N to 1
I/O request size: 65536 B
Aggregate read bandwidth: 668.811403 MB/s
Min. read bandwidth: 216.077518 MB/s
Total Read time: 2.369520 sec.
[root@ninja70 testbed]$ mpirun -hostfile nodelist.txt -n 4 prefix/libexec/sysio-read-gotcha -m /unifyfs2
[3] open failed (errno=2, No such file or directory)
[0] open failed (errno=2, No such file or directory)
[2] open failed (errno=2, No such file or directory)
[1] open failed (errno=2, No such file or directory)
When generating gfid (and other places where unifyfs uses full pathnames), the mountpoint prefix should be excluded, i.e., /testfile-0
instead of /unifyfs/testfile-0
.
@sandrain I disagree. The mountpoint serves as a namespace for its files. If multiple apps within the same session (i.e., same server) want to act on the same files, they should use the same mountpoint. Currently, we support multiple mountpoints within the same session, and the new client API will as well. Removing the mountpoint from the hash would lead to conflicts among the differing namespaces.
@MichaelBrim Okay, it makes sense. I just felt it might not be straightforward because it doesn't follow the typical/conventional expectation from mountpoint.
Although gfid is still meant to be unique across all app_ids, we do have the app_id so that we could use {app_id, gfid} as the unique identifier instead.
Currently, gfid is the MD5 hash of the full file path, which includes the mountpoint. If a client uses a different mountpoint, but the same suffix path, it will result in a different gfid. In the new client api, the app_id is dynamically generated by doing the MD5 hash of the mount/namespace prefix. So if we wanted to do {prefix-path MD5, suffix-path MD5} instead of just whole-path MD5, we could. But I think that would just require adding a whole bunch of code to pass an app_id where it wasn't necessary before.
@MichaelBrim that's a good point about the mountpoint serving as the namespace name. Question though:
Let's say your UnifyFS dir is called /stuff
. You write files to /stuff
. Then the server admin decides to create a totally unrelated /stuff
dir. Will UnifyFS currently let you mount your dataset as, say, /tmp/stuff
, so you can access your files?
In the current code, you can't access your files under a different mountpoint. However, regardless of what exists or has happened on a real file system, our prefix matching will always redirect file accesses under/stuff
to Unify if you give that as the mountpoint. If you're familiar with bind mounts, you can think of Unify as a bind mount.
@MichaelBrim that's true. You couldn't copy from the server's /stuff
to Unify's /stuff
though, but that probably an edge case we don't care about.