eio
eio copied to clipboard
Improve handling of Windows paths
In https://github.com/ocaml-multicore/eio/issues/124, @samwgoldman said:
For Flow, we have the need to share paths between platforms. These are relative paths, using the
/path separator. We only support Windows systems that transparently support the/path separator, so we can just pass these paths into Windows APIs without transforming them.
However, Eio also needs to support absolute paths on Windows and it seems this doesn't work (see https://discuss.ocaml.org/t/how-to-specify-a-full-windows-path-in-eio/14880). It would be good to investigate why.
@dra27 commented:
I must confess that introducing yet another notation for a Windows path doesn’t look at all sensible to me!
/C:/pathis simply invalid, and if that’s ever printed (error messages, exceptions, etc.) it kinda just looks like Eio doesn’t know what it’s doing.
As mentioned in the original issue:
The Fpath library encodes lots of knowledge about Windows paths. It probably can't be used directly because its paths behave differently depending on the host platform (preventing e.g. a program running on Linux from creating paths that are to be used on Windows). The path rules should probably be per-filesystem instead.
I think we should give up on a uniform representation of paths and instead move some operations (e.g. split) to Fs.Pi.DIR, so that each file-system can provide its own implementation.
Apologies for the misleading feedback I shared. Absolute paths never came up in Flow (in practice, all paths are relative to the process's working directory) and I wasn't aware that Windows APIs don't support / for absolute paths.
@samwgoldman not misleading at all; it was very useful to know we could ship 1.0 like that. I just tagged you in case you objected to a change of direction, but I think it won't affect you either way.
Various thoughts, partly following up a chat with @patricoferris last week: it sounds as though it's worth separating what EIO needs internally from paths vs what the EIO user sees/provides. From that:
- There is no "root" directory, so in terms of what's displayed / expected from a caller, one can never be provided (
C:\Windows,\Windows,\\LOCALHOST\c$\Windowsare all flavours of absolute paths on Windows and, on this machine, with a cwd on Drive C:, they all refer to the same physical directory). I think rather than trying to identify a concept of a root directory, it's better instead to let go of the notion entirely i.e. the portable design is to assume that there is no root directory. Internally, NT Object Manager Paths could be used to provide a mapping where there is a root directory (allowing, if needed, there always to be a relative path from one directory to another). - Modelling the separator internally as being
/everywhere may sound elegant but, especially as EIO is internally using NT APIs using object manager paths (where/is not the directory separator) it would possibly be better to use a stronger type than relying on a character (e.g. at its extreme,string list, but also using abstract/private strings to track the various conversions needed). - Just for reference, there's a similar issue manipulating directory lists in environment variables which affects opam 2.2 a lot. The best solution I came up with there is tracking lists where both the normalised and raw forms of the directory entries are kept together. In opam, this is done to keep separators consistent - the nightmare in environment variables comes from
:being an entry separator on Unix and so prohibited in directory names but being the volume separator on Windows, so the representation allowsC:\nasty path "with redundant quoting" with ";" and spaces\bin;C:\Windows\system32to be internally treated as two entriesC:\nasty path with redundant quoting with ; and spaces\bin(that's the actual directory name) andC:\Windows\system32but also to be re-displayed to the user with the same quoting as was provided (i.e. keeping the double-quotes where they were). A similar idea could be applied to the directory separators here - i.e. if the user providesC:\Users/DRA\Documents/Wavy\Separators/Are\In/Fashionthen internally having eitherC:/Users/DRA/Documents/Wavy/Separators/Are/In/Fashionor\??\C:\Users\DRA\Documents\Wavy\Separators\Are\In\Fashionis fine, but if the directory is for whatever reason displayed or passed back to the user then only the original form should be seen.
I think rather than trying to identify a concept of a root directory, it's better instead to let go of the notion entirely
Indeed, and Eio doesn't actually require this anyway. The directory fs refers to the current directory. It's just that it allows access to the rest of the file-system from there (so e.g. fs / ".." can be used to refer to the parent directory). Possibly it should have been called cwd_unconfined or something.
the representation allows
C:\nasty path "with redundant quoting" with ";" and spaces\bin;C:\Windows\system32to be internally treated as two entriesC:\nasty path with redundant quoting with ; and spaces\bin(that's the actual directory name)
I don't quite understand this. Does this mean that Windows doesn't allow quotes in names, but will ignore them if given?