Add normalize_lexically to `Path`
Proposal
Problem statement
For Unix platforms, we take pains to warn about the dangers of naively resolving .. components (i.e. resolving /path/to/../file as /path/file). However, that doesn't mean it's never useful. Sometimes when working within a subdirectory we don't intend to follow .. links. Also people have a habit of using a literal .. when they really did mean pop(). If nothing else, providing a function for this case can be a good hook to add documentation on the issue in a central location.
Motivating examples or use cases
Say you have a base path and you want the user to be able to use paths below it.
// You've already checked that the user_path is not a `/` root path and does not have any prefix
// but this still has issues because even a relative path may escape the base path
let subpath = base_path.join(user_path);
Solution sketch
Have a function that removes .. components from the path, in addition to the usual normalization that the components iterator does (such as normalizing separators).
impl Path {
// normalizes in place, avoiding an allocation.
pub fn normalize_lexically(&mut self) -> Result<&mut Self, NormalizeError>;
}
Or:
impl Path {
// more convenient but always allocates.
pub fn normalize_lexically(&self) -> Result<PathBuf, NormalizeError>;
}
Either way, this would return an error if the Path contains left over .. components. I.e. path\..\..\to\file resolves to ..\to\file. It could also error if it resolves to the empty path (less sure about this but unexpectedly empty paths can be a footgun).
Alternatives
- Instead of returning a
Result, we could collect any left over..components and place them at the beginning of the path. - The current name is chosen to be a bit weird so as to highlight that this is a potentially dangerous operation. Maybe another name could be chosen.
Links and related work
- Path Traversal attack is a case where you do want to remove
..components lexically from a user provided path. - C++
lexically_normal,lexically_relativeandlexically_proximate - Java
normalize - Go
clean - Nodejs
normalize
What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
- We think this problem seems worth solving, and the standard library might be the right place to solve it.
- We think that this probably doesn't belong in the standard library.
Second, if there's a concrete solution:
- We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
- We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.
I've updated this ACP based on notes from the libs-api meeting. It concentrates on what I'm now calling normalize_lexically which seemed to have general support although there was some uncertainty about the naming and some of the details.
Should this normalize all separators to MAIN_SEPARATOR?
I think that is implied by "in addition to the usual normalization that the components iterator does", since if you path.components().collect::<PathBuf>() they are joined using MAIN_SEPARATOR_STR.
I've added that link to the ACP. I've also added some links for other languages.
I've checked all linked implementation from other languages and they all behave the same regarding left-over ..:
| Implementation | a/../../b |
/a/../../b |
../a/../../b |
|---|---|---|---|
Go path.Clean |
../b |
/b |
../../b |
Java Path.normalize |
../b |
/b |
../../b |
Node.js path.normalize |
../b |
/b |
../../b |
C++ lexically_normal |
../b |
/b |
../../b |
Sometimes when working within a subdirectory we don't intend to follow .. links.
Depending on application it'd probably be safer to have some sort of join_beneath or similar where you specify a trusted prefix and some untrusted suffix and it would only normalize the suffix as long as it does not ascend out of the prefix.
E.g. https://docs.rs/safe-path/0.1.0/safe_path/fn.scoped_resolve.html
We discussed this in the @rust-lang/libs-api meeting today. We're happy to accept this with one slight modification: empty paths should just be allowed as-is rather than erroring. Errors should only be when trying to .. past the root.
We recognize that the behavior of erroring differs from other languages, but we believe that this behavior is more useful in practice for path validation.