go
go copied to clipboard
proposal: path/filepath: Sanitize
It is common for programs to accept filenames from untrusted sources. For example, an archive extractor might create files based on names in the archive, or a webserver may serve the content of local files identified by a URL path. In these cases, the program should usually sanitize the untrusted filenames before use. (See #55356 for a list of vulnerabilities caused by using unsanitized archive paths.)
We should provide a simple path sanitization function which accepts an untrusted input and returns something reasonably safe.
// Sanitize returns a representation of path under the current
// directory. If path is absolute or represents a location
// outside the current directory, Sanitize returns an error.
//
// Sanitize calls Clean on the result, but retains a trailing Separator if any.
//
// If a base path is joined to the result of Sanitize with Join,
// the resulting path will be contained within basepath.
//
// Sanitize does not consider symbolic links.
// Symbolic links can cause the sanitized path to reference a location
// outside the current directory.
func Sanitize(path string) (string, error)
Examples:
Sanitize("a") = "a", <nil>
Sanitize("a/b") = "a/b", <nil>
Sanitize("a/b/../c") = "a/c", <nil>
Sanitize("../a") = "", "../a" is outside the current directory
Sanitize("..") = "", ".." is outside the current directory
Sanitize(".") = "", "." is the current directory
Sanitize("/") = "", "/" is absolute
Sanitize("/a") = "", "/a" is absolute
Sanitize("a/") = "a/", <nil>
Sanitize("a/b/") = "a/b/", <nil>
// on Windows
Sanitize("NUL") = "", "NUL" is absolute
https://go.dev/play/p/EDzG8D15Zed contains a sample implementation.
I like the idea but when I saw the issue title, it's purpose wasn't at all clear to me from the name "Sanitize". It's not about cleanliness (we already have "Clean") but security. Maybe something as simple as "Safe", but with a name like that perhaps it should consider symbolic links. Perhaps it should anyway. A conundrum.
I'm not sold on the name, although "Sanitize" does have the connotation of being more than just clean. ("To reduce or eliminate pathogenic agents", says Merriam-Webster.)
Symbolic links are an interesting question. If we do consider them, then we need to know the base path that the untrusted path is relative to. Not all users will want to avoid symbolic links, either; an archive extractor writing to an output file wants to avoid writing through symlinks, but a web server probably wants to follow links in the directory it is serving from. In some cases, there might be a race condition between checking for the presence of a link and writing to the file.
I lean towards not handling symlinks. Besides the raciness that neild mentions, once the program opens the sanitized path, there's other edge cases that could possibly cause trouble, so it's hard to guarantee complete safety:
- Linux bind mounts could still lead you to write a file in a parent directory
- Some filesystems could alter or not support the sanitized filename, e.g. case insensitive filesystems
I think this API would still be safe for a fairly common use case: extracting files from an archive (like a txtar) into a new directory. In that case, you don't have to worry about symlinks.
As for the name: I understand API as "the file path is relative and a child". I agree with Rob that I'm not a big fan of Sanitize
, mainly because it doesn't give me that idea. Perhaps BoundedRelative
, in the sense that we want a relative path that is "bounded to" the "dot" directory.
I'm not a fan of the implicit “under the current directory” behavior — there are lots of cases where I'd like to restrict a path to be within some other working directory (such as the temporary directory returned by t.TempDir
within a particular test function).
I'm also not keen on the “retains a trailing Separator if any” caveat — it's inconsistent with filepath.Abs
and filepath.Join
, and I don't see a clear need for it.
Perhaps instead, we could have variants of Join
and Rel
that sanitize their results?
// JoinInDir joins any number of path elements to dir provided that
// the result is lexically still within dir. Empty elements are ignored.
//
// If the resulting path is not lexically within dir, JoinInDir returns a non-nil error.
func JoinInDir(dir, elem ...string) (string, error)
// RelInDir returns a path that is lexically equivalent to targpath when joined to dir,
// provided that targpath is lexically within dir.
// That is, JoinInDir(dir, RelInDir(dir, targpath)) is equivalent to targpath itself.
//
// An error is returned if targpath can't be made relative to basepath, if knowing
// the current working directory would be necessary to compute it, or if
// the resulting path would not be within dir when joined to it.
func RelInDir(dir, targpath string) (string, error)
Then, if needed, something very close to Sanitize
could be written in terms of those:
func Sanitize(path string) (string, error) {
if filepath.IsAbs(path) {
cwd, err := os.Getwd()
if _, err := filepath.RelInDir(cwd, path); err != nil {
return "", fmt.Errorf("%s is absolute", path)
}
return filepath.Clean(path)
}
return filepath.JoinInDir(".", path)
}
To restrict a path to some other working directory, you join the sanitized path to that directory:
sanitized, err := filepath.Sanitize(untrusted)
if err != nil { /* ... */ }
// path is guaranteed to be under /base/dir.
path := filepath.Join("/base/dir", sanitized)
One distinction between this and JoinInDir
is that the above will return an error if the untrusted path is absolute, while JoinInDir
--if consistent with Join
--will silently convert absolute paths into relative ones. (JoinInDir("/a", "/b")
is presumably "/a/b"
)
But I agree that it's very common to want to safely append an untrusted path to a trusted one, so perhaps the right operation here is a safer join.
I do think the function name should convey a sense that it's more secure than the alternatives. JoinInDir
is descriptive, but doesn't strongly indicate that it's more suitable for use on untrusted inputs than Join
. Perhaps SafeJoin
?
For relative paths, maybe there's a tree-reflexive property, where joining a relative path with a prefix only results in targeting the subset of nodes that are the child nodes of the prefix. The mental hiccup is assuming tree-reflexivity when that's not a general property of relative paths.
I think coercing the tree-reflexive property (or an error) from a relative path can be seen as a unary operation on a relative path, or a binary operation on a relative path with nothing.
RefSafe(prefix string, rel string)
with the tweaked notion of reflexivity for e.g. RefSafe("", "../")
I think would be easy enough to grok. It could handle an empty, relative, or absolute prefix.
Still, ChildSafe()
would be fun.
ISTM, most (all?) cases where you would want to use this involve os paths (since an fs.FS is rooted anyway) and joining to a base. I'm not sure when you would use path.Sanitize and not immediately turn around and do filepath.Join. So maybe it should just be filepath.SafeJoin(base string, followSymlinks bool, paths ...string) string
.
SafeJoin
seems easier to reason about, and I agree that most cases where you want this are going to join the path to a base anyway.
I'm not sold on the followSymlinks
parameter. At worst, this provides a false sense of security and space for TOCTOU bugs; the fact that no symlink exists along that path now says nothing about the future.
I'm not sure when you would use path.Sanitize and not immediately turn around and do filepath.Join.
I think there are use cases where it's "eventually" or even "never", rather than "immediately". For example, a CLI tool taking a path as an argument might not Join
until a non-trivial amount of work is done, but if it's going to fail when Sanitize
fails it would be better to reject when parsing arguments.
Thinking more about symlinks, filename sanitization is the wrong time to check for links. Links need to be checked for atomically at file open time, to avoid TOCTOU bugs. Possibly this indicates that we should have a function or functions in the os
package to securely open files without traversing symlinks, but I don't think filepath
is the right place for this.
One issue with SafeJoin
is that Join
treats absolute paths in components after the first as relative paths. I don't think that's the right behavior for handling untrusted paths; an absolute path should be an error, not silently converted to a relative path.
Would it be reasonable for SafeJoin
to be inconsistent with Join
here?
Join("/a", "/b") // /a/b
SafeJoin("/a", "/b") // error, /b is absolute
It should be an error if it escapes, regardless of how it does so. But SafeJoin("/a", "/a/b")
should work fine.
It should be an error if it escapes, regardless of how it does so. But
SafeJoin("/a", "/a/b")
should work fine.
What if the base directory is a secret for some reason, then an attacker sends absolute paths until it finds one that succeeds? ISTM, it's safer to just fail any absolute paths in the paths parameter.
I believe we have two possible APIs at this point. One is a Sanitize
function which takes a path and makes it safe. This is my original proposal (although I'm dropping the preservation of a trailing /
here).
// Sanitize returns a representation of path under the current
// directory. If path is absolute or represents a location
// outside the current directory, Sanitize returns an error.
//
// Sanitize calls Clean on the result.
//
// If a base path is joined to the result of Sanitize with Join,
// the resulting path will be contained within basepath.
//
// Sanitize does not consider symbolic links.
// Symbolic links can cause the sanitized path to reference a location
// outside the current directory.
func Sanitize(path string) (string, error)
The other possibility is a SafeJoin
.
// SafeJoin joins a base path to a relative path.
// The resulting path is guaranteed to contained within the base path.
// If relpath is an absolute path or contains any reference to a location outside the base path,
// SafeJoin returns an error. For example, SafeJoin("a", "../a/b") is an error, even though the
// final path is contained within the base path.
//
// SafeJoin calls Clean on the result.
//
// Sanitize does not consider symbolic links.
// Symbolic links can cause the resulting path to reference a location outside the base path.
func SafeJoin(basepath, relpath) (string, error)
Either of these functions may be trivially implemented in terms of the other.
func Sanitize(path string) (string, error) {
return SafeJoin("", path)
}
func SafeJoin(basepath, relpath) (string, error) {
s, err := Sanitize(relpath)
if err != nil {
return "", err
}
return Join(basepath, s)
}
Most use cases for sanitized paths will join an untrusted path to a trusted base directory, so SafeJoin
directly supports the more common operation. However, SafeJoin
is inconsistent with Join
: Join("a", "/b")
is "a/b"
, while SafeJoin
considers this an error. (We could make SafeJoin
consistent with Join
here, but I'm not convinced that silently converting an untrusted absolute path to a relative one is the correct behavior.)
I lean slightly towards Sanitize
, but only slightly.
Good summary. My preference is for SafeJoin. It feels like the more common case to me, and the names "Sanitize" vs. "Clean" are too similar and easy to get backwards.
(We could make
SafeJoin
consistent withJoin
here, but I'm not convinced that silently converting an untrusted absolute path to a relative one is the correct behavior.)
Could you give some more detail on that? It seems like the security property we want from SafeJoin
is “no writes outside of a particular working directory”, but Join("a", "/b")
→ "a/b"
does retain that property.
(And, for that matter, users who are familiar with Join
can always precede the call to SafeJoin
with a call to filepath.IsAbs
, although I'm not entirely sure whether that would flag the argument "/b"
on Windows.)
Could you give some more detail on that? It seems like the security property we want from SafeJoin is “no writes outside of a particular working directory”, but Join("a", "/b") → "a/b" does retain that property.
That retains this security property, but it introduces a silent rewrite of an unsafe path into a safe-but-different form. I think that it's much clearer to be explicit and to never attempt to transform a path.
For example, I think it's much better for a naive tar extractor to produce an error when trying to extract a file named /etc/passwd
than to silently convert the filename to ./etc/passwd
.
This also provides better behavior and consistency on Windows: SafeJoin(`a`, `C:\a`)
is an error, not a\C:\a
. SafeJoin(`a`, `\\host\share\b`)
is not a\host\share\b
.
If a user does want to convert absolute filenames to relative ones, this can be trivially achieved by prepending ./
: SafeJoin(base, "./"+rel)
There's a well-summarized idea section from Kubernetes around a vulnerability in volume mount paths concerning subpaths - another version of a problem where Sanitize
would be useful.
It looks like their solution purely working off of path text, not considering symlinks, relies on a PathWithinBase
, using two absolute paths.
The solution for safely mounting with symlinks apparently justified an entire package itself.
I think I prefer SafeJoin
behavior to other options, but seeing how the symlink problem is handled here makes me wonder if JoinWithinBase
is a more obvious spelling.
An interesting question is what should be done with a relative path of "."
. Is the base directory itself within the set of safe paths?
Sanitize(".") // "." or an error?
SafeJoin("a", ".") // "a" or an error?
I can think of arguments for either accepting or rejecting "."
, but we should document it either way.
Right now, I lean towards rejecting "."
, on the grounds that it's the safer alternative; rejecting a path the user might have wanted is less hazardous than accepting one they might not have.
rejecting a path the user might have wanted is less hazardous than accepting one they might not have.
I don't think I agree with that. Rejecting a path that the user might have wanted will encourage them to use a different function instead, which carries its own risks.
I think the specification for the functions is clearer if they allow a relative path of "."
. (“Outside” and “within” seem like clear enough words to describe what we mean.)
Another possible API:
// IsSafe reports whether the path refers to a location within the current directory.
// If path is absolute or references a location above the current directory, IsSafe returns an error.
// On Windows, IsSafe will return an error if path refers to a reserved filename such as "NUL".
//
// If IsSafe(path) returns nil, then Join(base, path) will always produce a path contained within base.
//
// IsSafe does not consider symbolic links, which can cause the file named by path to exist outside
// the current directory.
func IsSafe(path string) error
Sanitize
and SafeJoin
are just an IsSafe
check followed by Clean
or Join
.
I like the functionality of IsSafe
, but I'm not fond of the name. There can be unsafe paths in the current directory (What if the current directory happens to be /dev
? What if the path refers to a symlink to elsewhere?). What about IsLocalPath
?
I don't understand Sanitize
and IsSafe
. A path cannot be safe in and of itself, it can only be safe for use in a particular context. And the proposed functions don't have that context.
I'm more partial to SafeJoin
. But it seems like what is really called for is something like func SafeOpenInDir(root, path string) (*os.File, error)
which would also be able to do symlink checks, etc.
There is no universal definition of "safe", but there is a category of paths which reference locations which are lexically contained within some root. Perhaps there's a better name for these paths than "safe" or "sanitized".
IsLocalPath
feels like the inverse of IsNetworkPath
, which isn't right.
I think the name should convey a sense that this is a function one uses on untrusted paths to make them trustworthy (within limits).
But it seems like what is really called for is something like func SafeOpenInDir(root, path string) (*os.File, error) which would also be able to do symlink checks, etc.
I think there's a good argument to be made for a function like SafeOpenInDir
, although I'm not sure exactly what it should look like. It's not a replacement for Sanitize
/SafeJoin
/IsSafe
, however. For example (as mentioned in https://github.com/golang/go/issues/56219#issuecomment-1278168302), a webserver serving static files will want to sanitize untrusted paths to avoid escaping the content root, but may well want to follow symlinks because the local filesystem is considered to be trusted. (And indeed, this is exactly what http.FileSystem
does.)
It's worth noting that of the CVEs stemming from the lack of filename sanitization in archive/tar
and archive/zip
(see #55356), I don't believe any are due to inadvertent symlink traversal. Symlink traversal generally only becomes a vulnerability when the attacker has some level write access to the filesystem; this is a comparatively unusual scenario.
there is a category of paths which reference locations which are lexically contained within some root.
Yes, that is what I was trying to get at with IsLocalPath
. What is a good name for that category of paths?
there is a category of paths which reference locations which are lexically contained within some root.
Yes, that is what I was trying to get at with
IsLocalPath
. What is a good name for that category of paths?
UnrootedPath? FloatingPath? PathFragment? Subpath?
I tend to agree that the names “safe” and “sanitize” are not particularly descriptive, especially if new categories of unsafe or unsanitary paths are uncovered in the future. (Compare with, say, the word “new” used in an API to refer to version 2 of something.)
Moreover, I'm not sure that the word “Safe” is all that helpful in directing users to the appropriate API: if the user is reading the documentation, then it probably suffices to add a short sentence near the beginning of the function's doc comment (like “For most uses, prefer JoinInDir as a safer alternative.”).
If the user is not reading the documentation (and is instead coding from memory), then it's probably better for the new function to have a name starting with a common prefix with the less-safe alternative: that makes it more likely to be noticed in autocomplete results and more likely to be noticed when looking at the original function's documentation on https://pkg.go.dev.
I don't think the IsSafe
API is all that useful. Consider how it would be used:
if err := filepath.IsSafe(rel); err != nil {
return fmt.Errorf(…, err)
}
abs := filepath.Join(dir, rel)
But if the user remembers to do that, why wouldn't they just do the equivalent post-Join
check using the existing API, which is not that much more code?
abs := filepath.Join(dir, rel)
if clean, err := filepath.Rel(dir, abs); err != nil {
return fmt.Errorf(…, err)
} else if elem, _, _ := strings.Cut(clean, filepath.Separator); elem == ".." {
return fmt.Errorf(…, err)
}
Not much more code, but two function calls, two constants (Separator
and ..
), it incorrectly accepts dir="a", rel="../a"
, and I'm pretty sure that doesn't handle NUL
correctly at the moment. Path sanitization is tricky.
If new categories of unsafe or unsanitary paths are discovered, then we should detect them in the proposed functions. That's another reason we need some form of API for path sanitization: To capture user intent. (Contrast filepath.Abs
, where we don't know if the user is asking if a path will still refer to the same entity if the current directory changes, or if the path escapes the current directory. These are the same question on Unix, but not on Windows.)
I'm increasingly convinced that "safe" or "sanitized" is the right word to use here. The property we are concerned with here is lexical safety: Paths from untrusted sources which have been confirmed safe to use. A name like JoinInDir
is descriptive, but does not capture the intent of the operation.
If new categories of unsafe or unsanitary paths are discovered, then we should detect them in the proposed functions.
It is not at all obvious to me that the notion of “safe” is precise or universal enough for that to align with the Go compatibility policy. “Lexically in the same directory” is a clear property for which we can surely promise backward-compatibility. “Safe to use” requires additional context (“...for what purpose?”) and may mean different things to different users.
For example, note how many comments on this issue alone are concerned with symlinks. I do not consider symlinks “unsafe” to use, although they may certainly be unsafe to create — but I don't think my opinion on the safety of symlinks is universally held.
“Lexically in the same directory” is a clear property for which we can surely promise backward-compatibility.
Alas, this is also not the right property. "COM1" is lexically in the current directory, but it is not safe on Windows.