request-for-implementation icon indicating copy to clipboard operation
request-for-implementation copied to clipboard

A crate to handle Windows' "exotic" paths

Open Eh2406 opened this issue 6 years ago • 17 comments

Std path does some weird things with UNC paths (\\?\). For one std::fs::canonicalize always returns one whether it is needed or not. Then path.join will just do a string concatenation leading to invalid paths. This leads to bugs in many important parts of the ecosystem. (Cargo, wasm-pack, Rustup)

So we need a library that provides a binding to GetFullPathNameW on windows and uses std::fs::canonicalize otherwize. More ambitiously we need a path "interpreter" that ".. and . while appending. So if you started with \\?\C:\bar and joined ../foo you would iterate the components of ../foo and apply them to the base path, first applying .. to get \\?\C:\ and then applying foo to get \\?\C:\foo."

Ether part of this would be a valuable addition to the ecosystem!

Eh2406 avatar Jun 27 '19 14:06 Eh2406

Is this something that could be fixed in fs::canonicalize and Path::join? If not, is that because of backward compatibility restrictions or because the current behavior is useful and worth keeping? If the latter, how would the spec for this new library's canonicalize and join functionality be different from what std canonicalize and join claim to do?

dtolnay avatar Jun 27 '19 16:06 dtolnay

So @retep998 knows this better than I.

My impression is that the fs::canonicalize is locked by backward compatibility and adding an alternative like fs::normalize is blocked on the lib team being sure it is a good design (https://github.com/rust-lang/rust/issues/59117) The impression is that a crate could iterate and suggest uplift when polished.

My impression is that the std::Path::join api is technically correct. Technically \\?\C:\bar\../foo is a valid path on windows, it is "on drive C in folder bar in folder ../foo" and so joining \\?\C:\bar\ with ../foo should give you \\?\C:\bar\../foo. However if my Cargo toml claims that foo = { path = "../foo" }, then Cargo crashes if my working directory is \\?\C:\bar\, we need a library that does the same modifications to the path as joining ../foo would on a normal path.

Eh2406 avatar Jun 27 '19 18:06 Eh2406

Thanks, makes sense. Would something like the following API be sufficient for what Cargo needs? :

pub struct SanePath {...}

impl SanePath {
    pub fn normalize<P: AsRef<Path>>(path: P) -> Self;
    pub fn join<P: AsRef<Path>>(&self, path: P) -> Self;
    pub fn as_std_path(&self) -> &Path;
}

impl AsRef<Path> for SanePath {...}

dtolnay avatar Jun 27 '19 19:06 dtolnay

Yes! Or freestanding functions:

pub fn normalize<P: AsRef<Path>>(path: P) -> PathBuf {...}
pub fn join<P: AsRef<Path>>(base: &mut PathBuf, addition: P) {...}

Eh2406 avatar Jun 27 '19 19:06 Eh2406

fs::canonicalize is not broken, but rather it does a fundamentally different operation than what fs::normalize would do. One is asking the OS for the canonical path to a file that actually exists, the other is just turning a relative path into an absolute path.

retep998 avatar Jun 27 '19 21:06 retep998

I'd like to tackle this one, but from this issue I didn't understand whether popping on a .. and doing nothing on a . is enough (there are some edge cases but that the gist of it). This may break symlinks, is this desired? Are there any caveats one should be aware of?

basil-cow avatar Nov 09 '19 22:11 basil-cow

I think it is ok to brake symlinks. \\?\C:\bar/../foo is a valid windows path, but we don't want the folder named bar/../foo in C:\. We want the folder foo in C:\. Dose that makes sense?

Eh2406 avatar Nov 10 '19 01:11 Eh2406

Are https://crates.io/crates/path-absolutize and https://crates.io/crates/path-dedot what you need?

basil-cow avatar Nov 10 '19 18:11 basil-cow

Thanks for the links. Now that I have had time to read it carefully... I don't know. path-absolutize does not use GetFullPathNameW, witch is grate for cross compat, but makes me nervous that it will differ from the OS behavior. Neither crate has tests for UNC paths or cases of mixtures of / and \. From the examples and skimming the code I don't know how well it handles non-unicode data, but I am not sure how well std does ether.

Eh2406 avatar Nov 15 '19 22:11 Eh2406

I saw a post related to this on Reddit recently, and I'd like to try implementing this.

@Eh2406 @dtolnay could you have a look at my repository (https://github.com/ajeetdsouza/pathology) and tell me your initial thoughts? I've currently only written an implementation for Windows, but I'm working on one for Linux too.

The only function I've implemented is normalize, which lexically converts the path to its simplest form, without actually querying the filesystem.

On Windows, I'm using GetFullPathNameW for this. Since drive letters are case insensitive, but are written in uppercase by convention, I'm capitalizing it manually (GetFullPathNameW doesn't always do this). Error handling is still a WIP, I'll get that done soon enough.

On Linux, there is no system call to lexically normalize paths, but GNU realpath can do this via realpath -ms. I'm currently rewriting that logic in Rust.

ajeetdsouza avatar Aug 29 '20 14:08 ajeetdsouza

Update: I've added normalize on Unix, too.

ajeetdsouza avatar Aug 29 '20 18:08 ajeetdsouza

@ajeetdsouza sorry for the slow reply. That looks like a great start on a answer to the question "I use your_thing(&std::fs::canonicalize(...)) and it broke, what should I use instead?" I look forward to having that in our arsenal. Thank you!

@retep998 may be able to revue the windows api calls better than I.

Eh2406 avatar Sep 11 '20 21:09 Eh2406

Related note for others, dunce is an alternative to fs::canonicalize.

pksunkara avatar Oct 18 '20 10:10 pksunkara

normpath can now be used to solve this issue. It defines BasePath, which is very similar to SanePath, and PathExt::normalize can be used for normalization.

It would be great if @retep998 could review this crate as well.

dylni avatar Nov 09 '20 15:11 dylni

@Eh2406 What would be the next steps for integrating this into Cargo?

dylni avatar Nov 19 '20 23:11 dylni

Last time I had this paged in https://github.com/rust-lang/cargo/issues/6198 was the best link. Looks like we just got a related PR https://github.com/rust-lang/cargo/pull/8874, so coordinating my make sense. Another good place to start would be to grep for canonicalize in the code base. Sorry I don't remember this stuff better.

Eh2406 avatar Nov 20 '20 16:11 Eh2406

@Eh2406 Thanks. I'll start looking through where the path handling can be improved

dylni avatar Nov 20 '20 22:11 dylni