cargo icon indicating copy to clipboard operation
cargo copied to clipboard

Windows device path and long-path meta issue.

Open ehuss opened this issue 4 years ago • 19 comments

This is a meta issue to coordinate the different issues related to handling device paths and long paths on Windows (such as \\?\ or \\.\). There are several places where Cargo does not handle these well, but it is not clear exactly how they all should be approached. Changes for these require careful consideration, and it's not clear what a general good approach would look like. Some rough thoughts to consider:

  • Where exactly are the problems? (Making a very clear overview would be an extremely helpful way to help here!)
  • To what degree should we strive for long-path support? Having the target directory exceed MAX_PATH seems like it would be quite difficult due to issues like https://github.com/rust-lang/rust/pull/86406. Manifests require a registry setting that is off by default.
    • If we don't or can't support MAX_PATH paths, does it make sense to ever use device paths? Can they just be converted to normal paths and make Windows handle its regular normalization?
    • Is supporting long paths feasible without a manifest?
  • Should the fixes be primarily done to the standard library?
  • Should Cargo use an external library (like dunce), or should it all be internal? What should be done with normalize_path?
  • How to approach normalization/canonicalization? There are classic issues like whether to follow symlinks, but also the troubles of using's Rust's canonicalize function on Windows.
  • Should Cargo try to avoid device (aka verbatim) paths as much as possible?
  • Are there other ways to lean on Win32 normalization (like \\.\ or GetFullPathNameW)?
  • Should it try to use \\?\GLOBALROOT\ style paths (see https://github.com/rust-lang/rust/pull/86447)?
  • Is it feasible to just translate \\?\ paths to \\.\, and rely on the Win32 API to do normalization? This would not support long-paths, but there are many other problems with long paths. (Probably not, just tossing out the idea.)

Linking issues and PRs:

  • PR #8964 — Another fix for workspace path joining
  • PR #8881 — Fix --manifest-path for verbatim paths
  • PR #8874 — Append workspace paths using components
  • Issue #8626 — Crash when using \?\ style path on Windows
  • Issue #7986 — Add longPathAware to the app manifest on Windows
  • PR #7729 — fixed workspaces issues due to non-canonical manifest path
  • Issue #7686 — Workspace path is not fully resolved
  • Issue #7643 — Crash on non-absolute path in Windows
  • Issue #13020 — Failure to load git dependencies with long file names on Windows
  • Issue #6198 — "Package collision in lockfile" using UNC/root local device/etc paths
  • Issue #2516 — Crate failed to unpack on Windows (os error 3) - path name too long
  • Issue #13141 — Investigate better Windows long-path error handling
  • Issue #13919 — Windows \\?\ verbatim paths break idiomatic use of OUT_DIR and include!

ehuss avatar Aug 06 '21 19:08 ehuss

Since my PR was linked here, I would add that I'd really love to fix this issues in the standard library so everyone can benefit by default. My thinking at the moment is that Rust should auto-convert to \\?\ style paths whenever a filesystem function is called. Also Display always should use the more natural C:\ style paths (where possible) for showing paths to users.

GetFullPathNameW will be very useful here but it'd also be good to be able to work directly with WTF-8 paths so there's not a need to convert to UTF-16 and back when that's not necessary.

Btw, in case it helps someone, I've started writing about Windows paths which attempts to go in to some detail. It's still a work in progress so sorry if there's any mistakes or anything is unclear.

ChrisDenton avatar Aug 16 '21 18:08 ChrisDenton

Thanks for posting your writeup! I think it would be great to have a resource like that. Microsoft's own documentation is a little scattered and lacking, and having one place clearly describing things would be great. Let me know if you ever want feedback on it.

ehuss avatar Aug 16 '21 21:08 ehuss

Sure, I'd very much welcome feedback! I admit I mostly wrote it for myself which is why it's currently a "secret" gist so I'd appreciate any help in making more useful for others.

ChrisDenton avatar Aug 16 '21 21:08 ChrisDenton

@ehuss Is this just waiting on a decision from the Cargo team? I originally created normpath to fix these issues, but the team appears to be less certain now about how these issues should be addressed.

dylni avatar Aug 23 '21 02:08 dylni

@ChrisDenton Instead of fiddling around with NT paths and unnecessarily making life harder, why not use Embedded Manifests per executable ? Embedded Manifests were designed for a reason.

ghost avatar Sep 03 '21 22:09 ghost

@mshaikhcool Enabling the manifest option for long paths would be great! However, it has several limitations which means it doesn't solve all problems. It only works in Windows 10 version 1607 and newer. It requires the user to have admin rights and to change a registry entry. It doesn't fix the issue with broken drivers if you actually do want to resolve symlinks or get an absolute path.

ChrisDenton avatar Sep 04 '21 11:09 ChrisDenton

It requires the user to have admin rights and to change a registry entry.

The person who will be installing Rust in the first place will most likely be A Programmer, so this point is moot.
there's a good reason why longPathAware is not enabled by default and why explorer.exedoesn't embed longPathAware in it's manifest. Don't expect it to be enabled by default in a near future.

It only works in Windows 10 version 1607 and newer.

Ah, Classic. Symlinks were introduced in Vista. By that logic, we should not also use symlinks because XP didn't support it.

The point here is, Why are you even bothering in supporting an Out of Extended Life Support OS (like Windows 7) in the first place ? Set a check to see if windows version and the registry key enabled or not. if off eg: in win 7, it should not work.

It doesn't fix the issue with broken drivers if you actually do want to resolve symlinks or get an absolute path.

NT UNC(\\?\) is designed to be used by Subsystems themselves and Drivers, not user mode programs. longPathAware in both Embedded and Side by Side type Manifests are designed to be used by win32 subsystem's user mode programs, not Drivers.

Both has different purposes, Rust should implement both as a system programming lang. Rust should not call undocument apis and should adhere to strict-clean programming principles.

ghost avatar Sep 04 '21 13:09 ghost

As I said, using the longPathAware manifest option is great. It helps with a lot problems and should be done when possible.

However, it alone does not fix all the issues here nor in all circumstances. So other solutions need to be explored as well. Nobody is talking about using undocumented APIs. The use of \\?\ style paths is documented for every function that accepts them (e.g. CreateFileW).

ChrisDenton avatar Sep 04 '21 14:09 ChrisDenton

I think there's a fundamental misunderstanding here . Use NT UNC (\\?\) for Drivers , longPathAware for win32 subsystem's user mode programs.

Cargo or rustc themselves are user mode programs , they should implement longPathAware either in embedded or side by side type manifest for themselves . and should have support for NT UNC (\\?\) for building Drivers.

I hope it's clear now.

ghost avatar Sep 04 '21 14:09 ghost

I would suggest reading Win32 File Namespaces because it feels like we're talking about different things.

ChrisDenton avatar Sep 04 '21 14:09 ChrisDenton

can you point me exact win32 apis/scenarios where Manifest file couldn't work but works otherwise ?

ghost avatar Sep 04 '21 15:09 ghost

The manifest option is not sufficient to solve all issues listed here. For example:

  • It does not enable long paths if the user cannot or will not enable long path awareness in the registry (e.g. OS too old, IT policies, etc).
  • Even with the manifest option, fs::canonicalize returns \\?\ prefixed paths so Rust needs to be able to understand them and, ideally, have a way to convert them to more normal looking paths for display or other uses.
  • Ditto for user supplied paths which can be in any form (even explorer accepts \\?\ style paths).
  • fs::canonicalize can also completely fail with certain RAM drives. The manifest is no help here because that's a different problem.

ChrisDenton avatar Sep 04 '21 15:09 ChrisDenton

It does not enable long paths if the user cannot or will not enable long path awareness in the registry (e.g. OS too old, IT policies, etc).

this point is already moot because of Rust's potential usecases. not sure why it's being thrown around each time. Note that : Windows will always be a backward-compatible OS by default. Users must perform changes by themselves to make it forward-compatible.

To use long paths Users must use windows 10 and must enable group policy or registry. Note: this is the Microsoft recommended way. Going against MS's own recommendation does indeed sound like a Desperate Excuse to say "Won't Do".

Even with the manifest option, fs::canonicalize returns \\?\ prefixed paths

this is absolutely horrible implementation. the amount existing softwares break because of UNC, including Microsoft's owns is enough big of a reason to abandon UNC and the "the Excuse" presented above.

there are already proposals for abandoning that in favor of returning win32 absolute paths.

It doesn't fix the issue with broken drivers if you actually do want to resolve symlinks or get an absolute path.

Ah, now realized where the confusion is. this doesn't fix this issue is based on above horrible UNC return implementation which itself is wrong to begin with.

Ditto for user supplied paths which can be in any form

manifest just removes hard coded static buffer size from *W functions. the rest behaviors are unchanged.

(even explorer accepts \\?\ style paths).

Explorer doesn't accept \\?\ paths, it Simply ignores supplied \\?\ prefix.
explorer simply converts this \\?\C:\VeryLong255CharPath\VeryLong255Foo\ to 8.3 Path format c:\VERYLO~1\VERYLO~2 for it to access.

fs::canonicalize can also completely fail with certain RAM drives.

I guess you meant RAM Disk by that. creating RAM Disk requires a KMDF Device Driver, Device Drivers use win32 device paths \\.\ then making symlink to win32 file path \\?\ for user mode applications to access. Manifests Files both Embedded/Fusion or Side By Side (Foo.exe.manifest file) types should work as expected for RAM Disks too.

certain RAM Disk sounds like the software in question's KMDF Driver bug, We shouldn't have to cripple Rust for that.

The manifest is no help here because that's a different problem.

the Manifest files has to do with win32 file paths \\?\ and has nothing to do with win32 device paths \\.\ Most Windows API doesn't take \\.\ device paths as parameters as they already can access devices through symlinked \\?\ paths.

Now with Manifest support , one doesn't even need to attach \\?\ prefix or deal with UNC Path handling complexities anymore.

ghost avatar Sep 04 '21 23:09 ghost

NT UNC(\?) is designed to be used by Subsystems themselves and Drivers, not user mode programs.

Hi. Can someone point me to the origin of this statement? Are there any MS docs which can be linked?

Explorer doesn't accept \?\ paths, it Simply ignores supplied \?\ prefix. explorer simply converts this \?\C:\VeryLong255CharPath\VeryLong255Foo\ to 8.3 Path format c:\VERYLO~1\VERYLO~2 for it to access.

Maybe i'm doing it wrong, or has this changed in newer Windows Versions?

image

jessesna avatar Dec 13 '21 19:12 jessesna

Here's a brief guide Windows paths, some of the issues involved and what the standard library has done and is doing to address them. None of this is cargo specific but I hope it helps nonetheless. I'll try to keep this short but I fear I might fail.

Terminology cheat sheet

Path Term
C:\path\to\file Drive path
\\server\share UNC path
\\.\PIPE\name Device path (used for pipes, printers, etc)
\\?\C:\path\to\file
\\?\UNC\server\share
\\?\PIPE\name
Verbatim paths
\??\C:\path\to\file
\Device\HarddiskVolume2\path\to\file
NT kernel paths (not used in Win32 APIs)

NT kernel paths are what both verbatim and the non-verbatim paths end up as but aren't otherwise usable in most user space APIs. So when I say "non-verbatim paths" I mean the first three paths in the table and not including kernel paths.

Verbatim paths

Verbatim paths are passed almost directly to the kernel (except \\?\ is changed to \??\). These are always absolute and don't contain . or .. components because those will simply be treated as normal components (e.g. . is a perfectly valid file or directory name according to the kernel, though most filesystem drivers will probably reject it). Also / is not a path separator; in fact everything except \ is not special in any way.

The term "verbatim" is not official terminology but it's the one used by the Rust standard library for lack of an official name.

Non-verbatim paths

Unlike a verbatim path, other paths are subject to limits (such as MAX_PATH, unless a manifest is used) and are parsed in more complex ways. Parsing of non-verbatim paths includes (but is not limited to):

  • If a drive path ends with a special DOS device name, it will be turned into a device path. E.g. C:\path\to\aux.txt will become the NT kernel path \??\aux.
  • If the path includes any . or .. components, these will be resolved lexically. Or to put it another way, resolving .. doesn't read a symlink. It simply does the equivalent of PathBuf::pop().
  • It will trim any trailing . and (space) from the path.
  • Any / will be converted to \ and consecutive \'s will be collapsed into one \.

Path Issues

  • Non-verbatim paths are subject to the legacy maximum path limit (usually 260 UTF-16 code units but sometimes 248).
  • DOS device file names will be converted to a device path when used. So you may end up opening, say, a console handle instead of a file. This can be a particular problem when moving paths from a *nix system.
  • std::fs::canonicalize returns verbatim paths which path handling routines can sometimes struggle to deal with. As discussed above, . and .. components should not appear in verbatim paths and / won't be automatically converted to \.

Filesystem issue

std::fs::canonicalize can fail if the root drive's driver does not implement a necessary kernel interface. This is normally not an issue but there is at least one popular RAM drive software that uses such a broken driver and is a reliable source of bug reports (not just for Rust applications).

Rust standard library

Rust's standard library is addressing these issues in a number of ways:

  • rust-lang/rust#89270 correctly handles pushing . and .. components to verbatim paths as well as converting / to \. This is stable in Rust 1.57
  • rust-lang/rust#89174 converts the path to verbatim (if necessary) just before it's used in filesystem APIs. This ensures long paths can always be used even if the user hasn't opted in to them. This will be stable in Rust 1.58,
  • rust-lang/rust#91673 (not yet accepted) implements std::path::absolute as a drop-in replacement for std::fs::canonicalize. This will allow creating an absolute path without having to use canonicalize. It could also be used to check for for device paths such as \\.\AUX, which aren't filesystem paths.
  • rust-lang/rust#86447 (not yet merged) would allow std::fs::canonicalize to succeed in more situations. But it's pending a decision on the funky paths it can return instead of failure.

Outstanding issues

The standard library does not provide a public API to convert between verbatim and non-verbatim paths. Currently the best option would be to use a third party crate for this.

The current directory is always limited by MAX_PATH unless a manifest file is used and the user opts in to enabling long path support. This cannot be fixed by the standard library itself because verbatim paths do not work for the get/set current directory APIs (or rather, they technically work but other Windows APIs will get very confused by it).

ChrisDenton avatar Dec 14 '21 01:12 ChrisDenton

It does not enable long paths if the user cannot or will not enable long path awareness in the registry (e.g. OS too old, IT policies, etc).

this point is already moot because of Rust's potential usecases. not sure why it's being thrown around each time. Note that : Windows will always be a backward-compatible OS by default. Users must perform changes by themselves to make it forward-compatible.

How is enabling long path awareness when the application manifest enables it but the register doesn't not backwards-compatible? If the apllication itself opts in, why is there an additional system wide opt in necessary for backwards compatibility?

bjorn3 avatar Jan 10 '22 22:01 bjorn3

If the apllication itself opts in, why is there an additional system wide opt in necessary for backwards compatibility?

I think we can only guess, I haven't seen any explanation from Microsoft. I suspect it is because other programs may fail to access those paths. For example, I believe when it was first added, Explorer couldn't handle those long paths. It introduces an environment where various programs would suddenly start breaking in unpleasant ways when interacting with programs that are long-path aware.

It could also be a security issue similar to how symbolic links are restricted.

ehuss avatar Jan 10 '22 23:01 ehuss