request-for-implementation
request-for-implementation copied to clipboard
A crate to handle Windows' "exotic" paths
Std path does some weird things with UNC paths (\\?\). For one std::fs::canonicalize always returns one whether it is needed or not. Then path.join will just do a string concatenation leading to invalid paths. This leads to bugs in many important parts of the ecosystem. (Cargo, wasm-pack, Rustup)
So we need a library that provides a binding to GetFullPathNameW on windows and uses std::fs::canonicalize otherwize.
More ambitiously we need a path "interpreter" that ".. and . while appending. So if you started with \\?\C:\bar and joined ../foo you would iterate the components of ../foo and apply them to the base path, first applying .. to get \\?\C:\ and then applying foo to get \\?\C:\foo."
Ether part of this would be a valuable addition to the ecosystem!
Is this something that could be fixed in fs::canonicalize and Path::join? If not, is that because of backward compatibility restrictions or because the current behavior is useful and worth keeping? If the latter, how would the spec for this new library's canonicalize and join functionality be different from what std canonicalize and join claim to do?
So @retep998 knows this better than I.
My impression is that the fs::canonicalize is locked by backward compatibility and adding an alternative like fs::normalize is blocked on the lib team being sure it is a good design (https://github.com/rust-lang/rust/issues/59117) The impression is that a crate could iterate and suggest uplift when polished.
My impression is that the std::Path::join api is technically correct. Technically \\?\C:\bar\../foo is a valid path on windows, it is "on drive C in folder bar in folder ../foo" and so joining \\?\C:\bar\ with ../foo should give you \\?\C:\bar\../foo. However if my Cargo toml claims that foo = { path = "../foo" }, then Cargo crashes if my working directory is \\?\C:\bar\, we need a library that does the same modifications to the path as joining ../foo would on a normal path.
Thanks, makes sense. Would something like the following API be sufficient for what Cargo needs? :
pub struct SanePath {...}
impl SanePath {
pub fn normalize<P: AsRef<Path>>(path: P) -> Self;
pub fn join<P: AsRef<Path>>(&self, path: P) -> Self;
pub fn as_std_path(&self) -> &Path;
}
impl AsRef<Path> for SanePath {...}
Yes! Or freestanding functions:
pub fn normalize<P: AsRef<Path>>(path: P) -> PathBuf {...}
pub fn join<P: AsRef<Path>>(base: &mut PathBuf, addition: P) {...}
fs::canonicalize is not broken, but rather it does a fundamentally different operation than what fs::normalize would do. One is asking the OS for the canonical path to a file that actually exists, the other is just turning a relative path into an absolute path.
I'd like to tackle this one, but from this issue I didn't understand whether popping on a .. and doing nothing on a . is enough (there are some edge cases but that the gist of it). This may break symlinks, is this desired? Are there any caveats one should be aware of?
I think it is ok to brake symlinks. \\?\C:\bar/../foo is a valid windows path, but we don't want the folder named bar/../foo in C:\. We want the folder foo in C:\. Dose that makes sense?
Are https://crates.io/crates/path-absolutize and https://crates.io/crates/path-dedot what you need?
Thanks for the links. Now that I have had time to read it carefully... I don't know. path-absolutize does not use GetFullPathNameW, witch is grate for cross compat, but makes me nervous that it will differ from the OS behavior. Neither crate has tests for UNC paths or cases of mixtures of / and \. From the examples and skimming the code I don't know how well it handles non-unicode data, but I am not sure how well std does ether.
I saw a post related to this on Reddit recently, and I'd like to try implementing this.
@Eh2406 @dtolnay could you have a look at my repository (https://github.com/ajeetdsouza/pathology) and tell me your initial thoughts? I've currently only written an implementation for Windows, but I'm working on one for Linux too.
The only function I've implemented is normalize, which lexically converts the path to its simplest form, without actually querying the filesystem.
On Windows, I'm using GetFullPathNameW for this. Since drive letters are case insensitive, but are written in uppercase by convention, I'm capitalizing it manually (GetFullPathNameW doesn't always do this). Error handling is still a WIP, I'll get that done soon enough.
On Linux, there is no system call to lexically normalize paths, but GNU realpath can do this via realpath -ms. I'm currently rewriting that logic in Rust.
Update: I've added normalize on Unix, too.
@ajeetdsouza sorry for the slow reply. That looks like a great start on a answer to the question "I use your_thing(&std::fs::canonicalize(...)) and it broke, what should I use instead?" I look forward to having that in our arsenal. Thank you!
@retep998 may be able to revue the windows api calls better than I.
Related note for others, dunce is an alternative to fs::canonicalize.
normpath can now be used to solve this issue. It defines BasePath, which is very similar to SanePath, and PathExt::normalize can be used for normalization.
It would be great if @retep998 could review this crate as well.
@Eh2406 What would be the next steps for integrating this into Cargo?
Last time I had this paged in https://github.com/rust-lang/cargo/issues/6198 was the best link. Looks like we just got a related PR https://github.com/rust-lang/cargo/pull/8874, so coordinating my make sense. Another good place to start would be to grep for canonicalize in the code base. Sorry I don't remember this stuff better.
@Eh2406 Thanks. I'll start looking through where the path handling can be improved