go icon indicating copy to clipboard operation
go copied to clipboard

proposal: path/filepath: Sanitize

Open neild opened this issue 2 years ago • 37 comments

It is common for programs to accept filenames from untrusted sources. For example, an archive extractor might create files based on names in the archive, or a webserver may serve the content of local files identified by a URL path. In these cases, the program should usually sanitize the untrusted filenames before use. (See #55356 for a list of vulnerabilities caused by using unsanitized archive paths.)

We should provide a simple path sanitization function which accepts an untrusted input and returns something reasonably safe.

// Sanitize returns a representation of path under the current
// directory. If path is absolute or represents a location
// outside the current directory, Sanitize returns an error.
//
// Sanitize calls Clean on the result, but retains a trailing Separator if any.
//
// If a base path is joined to the result of Sanitize with Join,
// the resulting path will be contained within basepath.
//
// Sanitize does not consider symbolic links.
// Symbolic links can cause the sanitized path to reference a location
// outside the current directory.
func Sanitize(path string) (string, error)

Examples:

Sanitize("a") = "a", <nil>
Sanitize("a/b") = "a/b", <nil>
Sanitize("a/b/../c") = "a/c", <nil>
Sanitize("../a") = "", "../a" is outside the current directory
Sanitize("..") = "", ".." is outside the current directory
Sanitize(".") = "", "." is the current directory
Sanitize("/") = "", "/" is absolute
Sanitize("/a") = "", "/a" is absolute
Sanitize("a/") = "a/", <nil>
Sanitize("a/b/") = "a/b/", <nil>

// on Windows
Sanitize("NUL") = "", "NUL" is absolute

https://go.dev/play/p/EDzG8D15Zed contains a sample implementation.

neild avatar Oct 13 '22 18:10 neild

I like the idea but when I saw the issue title, it's purpose wasn't at all clear to me from the name "Sanitize". It's not about cleanliness (we already have "Clean") but security. Maybe something as simple as "Safe", but with a name like that perhaps it should consider symbolic links. Perhaps it should anyway. A conundrum.

robpike avatar Oct 13 '22 20:10 robpike

I'm not sold on the name, although "Sanitize" does have the connotation of being more than just clean. ("To reduce or eliminate pathogenic agents", says Merriam-Webster.)

Symbolic links are an interesting question. If we do consider them, then we need to know the base path that the untrusted path is relative to. Not all users will want to avoid symbolic links, either; an archive extractor writing to an output file wants to avoid writing through symlinks, but a web server probably wants to follow links in the directory it is serving from. In some cases, there might be a race condition between checking for the presence of a link and writing to the file.

neild avatar Oct 13 '22 20:10 neild

I lean towards not handling symlinks. Besides the raciness that neild mentions, once the program opens the sanitized path, there's other edge cases that could possibly cause trouble, so it's hard to guarantee complete safety:

  • Linux bind mounts could still lead you to write a file in a parent directory
  • Some filesystems could alter or not support the sanitized filename, e.g. case insensitive filesystems

I think this API would still be safe for a fairly common use case: extracting files from an archive (like a txtar) into a new directory. In that case, you don't have to worry about symlinks.

As for the name: I understand API as "the file path is relative and a child". I agree with Rob that I'm not a big fan of Sanitize, mainly because it doesn't give me that idea. Perhaps BoundedRelative, in the sense that we want a relative path that is "bounded to" the "dot" directory.

mvdan avatar Oct 14 '22 07:10 mvdan

I'm not a fan of the implicit “under the current directory” behavior — there are lots of cases where I'd like to restrict a path to be within some other working directory (such as the temporary directory returned by t.TempDir within a particular test function).

I'm also not keen on the “retains a trailing Separator if any” caveat — it's inconsistent with filepath.Abs and filepath.Join, and I don't see a clear need for it.

Perhaps instead, we could have variants of Join and Rel that sanitize their results?

// JoinInDir joins any number of path elements to dir provided that
// the result is lexically still within dir. Empty elements are ignored.
//
// If the resulting path is not lexically within dir, JoinInDir returns a non-nil error.
func JoinInDir(dir, elem ...string) (string, error)

// RelInDir returns a path that is lexically equivalent to targpath when joined to dir,
// provided that targpath is lexically within dir.
// That is, JoinInDir(dir, RelInDir(dir, targpath)) is equivalent to targpath itself.
//
// An error is returned if targpath can't be made relative to basepath, if knowing
// the current working directory would be necessary to compute it, or if
// the resulting path would not be within dir when joined to it.
func RelInDir(dir, targpath string) (string, error)

Then, if needed, something very close to Sanitize could be written in terms of those:

func Sanitize(path string) (string, error) {
	if filepath.IsAbs(path) {
		cwd, err := os.Getwd()
		if _, err := filepath.RelInDir(cwd, path); err != nil {
			return "", fmt.Errorf("%s is absolute", path)
		}
		return filepath.Clean(path)
	}

	return filepath.JoinInDir(".", path)
}

bcmills avatar Oct 14 '22 16:10 bcmills

To restrict a path to some other working directory, you join the sanitized path to that directory:

sanitized, err := filepath.Sanitize(untrusted)
if err != nil { /* ... */ }

// path is guaranteed to be under /base/dir.
path := filepath.Join("/base/dir", sanitized)

One distinction between this and JoinInDir is that the above will return an error if the untrusted path is absolute, while JoinInDir--if consistent with Join--will silently convert absolute paths into relative ones. (JoinInDir("/a", "/b") is presumably "/a/b")

But I agree that it's very common to want to safely append an untrusted path to a trusted one, so perhaps the right operation here is a safer join.

I do think the function name should convey a sense that it's more secure than the alternatives. JoinInDir is descriptive, but doesn't strongly indicate that it's more suitable for use on untrusted inputs than Join. Perhaps SafeJoin?

neild avatar Oct 14 '22 16:10 neild

For relative paths, maybe there's a tree-reflexive property, where joining a relative path with a prefix only results in targeting the subset of nodes that are the child nodes of the prefix. The mental hiccup is assuming tree-reflexivity when that's not a general property of relative paths.

I think coercing the tree-reflexive property (or an error) from a relative path can be seen as a unary operation on a relative path, or a binary operation on a relative path with nothing.

RefSafe(prefix string, rel string) with the tweaked notion of reflexivity for e.g. RefSafe("", "../") I think would be easy enough to grok. It could handle an empty, relative, or absolute prefix.

Still, ChildSafe() would be fun.

AndrewHarrisSPU avatar Oct 15 '22 22:10 AndrewHarrisSPU

ISTM, most (all?) cases where you would want to use this involve os paths (since an fs.FS is rooted anyway) and joining to a base. I'm not sure when you would use path.Sanitize and not immediately turn around and do filepath.Join. So maybe it should just be filepath.SafeJoin(base string, followSymlinks bool, paths ...string) string.

earthboundkid avatar Oct 20 '22 20:10 earthboundkid

SafeJoin seems easier to reason about, and I agree that most cases where you want this are going to join the path to a base anyway.

I'm not sold on the followSymlinks parameter. At worst, this provides a false sense of security and space for TOCTOU bugs; the fact that no symlink exists along that path now says nothing about the future.

neild avatar Oct 20 '22 21:10 neild

I'm not sure when you would use path.Sanitize and not immediately turn around and do filepath.Join.

I think there are use cases where it's "eventually" or even "never", rather than "immediately". For example, a CLI tool taking a path as an argument might not Join until a non-trivial amount of work is done, but if it's going to fail when Sanitize fails it would be better to reject when parsing arguments.

AndrewHarrisSPU avatar Oct 20 '22 22:10 AndrewHarrisSPU

Thinking more about symlinks, filename sanitization is the wrong time to check for links. Links need to be checked for atomically at file open time, to avoid TOCTOU bugs. Possibly this indicates that we should have a function or functions in the os package to securely open files without traversing symlinks, but I don't think filepath is the right place for this.

One issue with SafeJoin is that Join treats absolute paths in components after the first as relative paths. I don't think that's the right behavior for handling untrusted paths; an absolute path should be an error, not silently converted to a relative path.

Would it be reasonable for SafeJoin to be inconsistent with Join here?

Join("/a", "/b")     // /a/b
SafeJoin("/a", "/b") // error, /b is absolute

neild avatar Oct 20 '22 23:10 neild

It should be an error if it escapes, regardless of how it does so. But SafeJoin("/a", "/a/b") should work fine.

jimmyfrasche avatar Oct 20 '22 23:10 jimmyfrasche

It should be an error if it escapes, regardless of how it does so. But SafeJoin("/a", "/a/b") should work fine.

What if the base directory is a secret for some reason, then an attacker sends absolute paths until it finds one that succeeds? ISTM, it's safer to just fail any absolute paths in the paths parameter.

earthboundkid avatar Oct 21 '22 01:10 earthboundkid

I believe we have two possible APIs at this point. One is a Sanitize function which takes a path and makes it safe. This is my original proposal (although I'm dropping the preservation of a trailing / here).

// Sanitize returns a representation of path under the current
// directory. If path is absolute or represents a location
// outside the current directory, Sanitize returns an error.
//
// Sanitize calls Clean on the result.
//
// If a base path is joined to the result of Sanitize with Join,
// the resulting path will be contained within basepath.
//
// Sanitize does not consider symbolic links.
// Symbolic links can cause the sanitized path to reference a location
// outside the current directory.
func Sanitize(path string) (string, error)

The other possibility is a SafeJoin.

// SafeJoin joins a base path to a relative path.
// The resulting path is guaranteed to contained within the base path.
// If relpath is an absolute path or contains any reference to a location outside the base path,
// SafeJoin returns an error. For example, SafeJoin("a", "../a/b") is an error, even though the
// final path is contained within the base path.
//
// SafeJoin calls Clean on the result.
//
// Sanitize does not consider symbolic links.
// Symbolic links can cause the resulting path to reference a location outside the base path.
func SafeJoin(basepath, relpath) (string, error)

Either of these functions may be trivially implemented in terms of the other.

func Sanitize(path string) (string, error) {
  return SafeJoin("", path)
}

func SafeJoin(basepath, relpath) (string, error) {
  s, err := Sanitize(relpath)
  if err != nil {
    return "", err
  }
  return Join(basepath, s)
}

Most use cases for sanitized paths will join an untrusted path to a trusted base directory, so SafeJoin directly supports the more common operation. However, SafeJoin is inconsistent with Join: Join("a", "/b") is "a/b", while SafeJoin considers this an error. (We could make SafeJoin consistent with Join here, but I'm not convinced that silently converting an untrusted absolute path to a relative one is the correct behavior.)

I lean slightly towards Sanitize, but only slightly.

neild avatar Oct 24 '22 18:10 neild

Good summary. My preference is for SafeJoin. It feels like the more common case to me, and the names "Sanitize" vs. "Clean" are too similar and easy to get backwards.

earthboundkid avatar Oct 24 '22 18:10 earthboundkid

(We could make SafeJoin consistent with Join here, but I'm not convinced that silently converting an untrusted absolute path to a relative one is the correct behavior.)

Could you give some more detail on that? It seems like the security property we want from SafeJoin is “no writes outside of a particular working directory”, but Join("a", "/b")"a/b" does retain that property.

(And, for that matter, users who are familiar with Join can always precede the call to SafeJoin with a call to filepath.IsAbs, although I'm not entirely sure whether that would flag the argument "/b" on Windows.)

bcmills avatar Oct 24 '22 19:10 bcmills

Could you give some more detail on that? It seems like the security property we want from SafeJoin is “no writes outside of a particular working directory”, but Join("a", "/b") → "a/b" does retain that property.

That retains this security property, but it introduces a silent rewrite of an unsafe path into a safe-but-different form. I think that it's much clearer to be explicit and to never attempt to transform a path.

For example, I think it's much better for a naive tar extractor to produce an error when trying to extract a file named /etc/passwd than to silently convert the filename to ./etc/passwd.

This also provides better behavior and consistency on Windows: SafeJoin(`a`, `C:\a`) is an error, not a\C:\a. SafeJoin(`a`, `\\host\share\b`) is not a\host\share\b.

If a user does want to convert absolute filenames to relative ones, this can be trivially achieved by prepending ./: SafeJoin(base, "./"+rel)

neild avatar Oct 24 '22 20:10 neild

There's a well-summarized idea section from Kubernetes around a vulnerability in volume mount paths concerning subpaths - another version of a problem where Sanitize would be useful.

It looks like their solution purely working off of path text, not considering symlinks, relies on a PathWithinBase, using two absolute paths.

The solution for safely mounting with symlinks apparently justified an entire package itself.

I think I prefer SafeJoin behavior to other options, but seeing how the symlink problem is handled here makes me wonder if JoinWithinBase is a more obvious spelling.

AndrewHarrisSPU avatar Oct 24 '22 21:10 AndrewHarrisSPU

An interesting question is what should be done with a relative path of ".". Is the base directory itself within the set of safe paths?

Sanitize(".")      // "." or an error?
SafeJoin("a", ".") // "a" or an error?

I can think of arguments for either accepting or rejecting ".", but we should document it either way.

Right now, I lean towards rejecting ".", on the grounds that it's the safer alternative; rejecting a path the user might have wanted is less hazardous than accepting one they might not have.

neild avatar Oct 25 '22 15:10 neild

rejecting a path the user might have wanted is less hazardous than accepting one they might not have.

I don't think I agree with that. Rejecting a path that the user might have wanted will encourage them to use a different function instead, which carries its own risks.

I think the specification for the functions is clearer if they allow a relative path of ".". (“Outside” and “within” seem like clear enough words to describe what we mean.)

bcmills avatar Oct 25 '22 15:10 bcmills

Another possible API:

// IsSafe reports whether the path refers to a location within the current directory.
// If path is absolute or references a location above the current directory, IsSafe returns an error.
// On Windows, IsSafe will return an error if path refers to a reserved filename such as "NUL".
//
// If IsSafe(path) returns nil, then Join(base, path) will always produce a path contained within base.
//
// IsSafe does not consider symbolic links, which can cause the file named by path to exist outside
// the current directory.
func IsSafe(path string) error

Sanitize and SafeJoin are just an IsSafe check followed by Clean or Join.

neild avatar Oct 25 '22 20:10 neild

I like the functionality of IsSafe, but I'm not fond of the name. There can be unsafe paths in the current directory (What if the current directory happens to be /dev? What if the path refers to a symlink to elsewhere?). What about IsLocalPath?

ianlancetaylor avatar Oct 25 '22 23:10 ianlancetaylor

I don't understand Sanitize and IsSafe. A path cannot be safe in and of itself, it can only be safe for use in a particular context. And the proposed functions don't have that context.

I'm more partial to SafeJoin. But it seems like what is really called for is something like func SafeOpenInDir(root, path string) (*os.File, error) which would also be able to do symlink checks, etc.

magical avatar Oct 25 '22 23:10 magical

There is no universal definition of "safe", but there is a category of paths which reference locations which are lexically contained within some root. Perhaps there's a better name for these paths than "safe" or "sanitized".

IsLocalPath feels like the inverse of IsNetworkPath, which isn't right.

I think the name should convey a sense that this is a function one uses on untrusted paths to make them trustworthy (within limits).

But it seems like what is really called for is something like func SafeOpenInDir(root, path string) (*os.File, error) which would also be able to do symlink checks, etc.

I think there's a good argument to be made for a function like SafeOpenInDir, although I'm not sure exactly what it should look like. It's not a replacement for Sanitize/SafeJoin/IsSafe, however. For example (as mentioned in https://github.com/golang/go/issues/56219#issuecomment-1278168302), a webserver serving static files will want to sanitize untrusted paths to avoid escaping the content root, but may well want to follow symlinks because the local filesystem is considered to be trusted. (And indeed, this is exactly what http.FileSystem does.)

It's worth noting that of the CVEs stemming from the lack of filename sanitization in archive/tar and archive/zip (see #55356), I don't believe any are due to inadvertent symlink traversal. Symlink traversal generally only becomes a vulnerability when the attacker has some level write access to the filesystem; this is a comparatively unusual scenario.

neild avatar Oct 26 '22 02:10 neild

there is a category of paths which reference locations which are lexically contained within some root.

Yes, that is what I was trying to get at with IsLocalPath. What is a good name for that category of paths?

ianlancetaylor avatar Oct 26 '22 03:10 ianlancetaylor

there is a category of paths which reference locations which are lexically contained within some root.

Yes, that is what I was trying to get at with IsLocalPath. What is a good name for that category of paths?

UnrootedPath? FloatingPath? PathFragment? Subpath?

magical avatar Oct 26 '22 04:10 magical

I tend to agree that the names “safe” and “sanitize” are not particularly descriptive, especially if new categories of unsafe or unsanitary paths are uncovered in the future. (Compare with, say, the word “new” used in an API to refer to version 2 of something.)

Moreover, I'm not sure that the word “Safe” is all that helpful in directing users to the appropriate API: if the user is reading the documentation, then it probably suffices to add a short sentence near the beginning of the function's doc comment (like “For most uses, prefer JoinInDir as a safer alternative.”).

If the user is not reading the documentation (and is instead coding from memory), then it's probably better for the new function to have a name starting with a common prefix with the less-safe alternative: that makes it more likely to be noticed in autocomplete results and more likely to be noticed when looking at the original function's documentation on https://pkg.go.dev.

bcmills avatar Oct 26 '22 13:10 bcmills

I don't think the IsSafe API is all that useful. Consider how it would be used:

if err := filepath.IsSafe(rel); err != nil {
	return fmt.Errorf(…, err)
}
abs := filepath.Join(dir, rel)

But if the user remembers to do that, why wouldn't they just do the equivalent post-Join check using the existing API, which is not that much more code?

abs := filepath.Join(dir, rel)
if clean, err := filepath.Rel(dir, abs); err != nil {
	return fmt.Errorf(…, err)
} else if elem, _, _ := strings.Cut(clean, filepath.Separator); elem == ".." {
	return fmt.Errorf(…, err)
}

bcmills avatar Oct 26 '22 13:10 bcmills

Not much more code, but two function calls, two constants (Separator and ..), it incorrectly accepts dir="a", rel="../a", and I'm pretty sure that doesn't handle NUL correctly at the moment. Path sanitization is tricky.

If new categories of unsafe or unsanitary paths are discovered, then we should detect them in the proposed functions. That's another reason we need some form of API for path sanitization: To capture user intent. (Contrast filepath.Abs, where we don't know if the user is asking if a path will still refer to the same entity if the current directory changes, or if the path escapes the current directory. These are the same question on Unix, but not on Windows.)

I'm increasingly convinced that "safe" or "sanitized" is the right word to use here. The property we are concerned with here is lexical safety: Paths from untrusted sources which have been confirmed safe to use. A name like JoinInDir is descriptive, but does not capture the intent of the operation.

neild avatar Oct 26 '22 15:10 neild

If new categories of unsafe or unsanitary paths are discovered, then we should detect them in the proposed functions.

It is not at all obvious to me that the notion of “safe” is precise or universal enough for that to align with the Go compatibility policy. “Lexically in the same directory” is a clear property for which we can surely promise backward-compatibility. “Safe to use” requires additional context (“...for what purpose?”) and may mean different things to different users.

For example, note how many comments on this issue alone are concerned with symlinks. I do not consider symlinks “unsafe” to use, although they may certainly be unsafe to create — but I don't think my opinion on the safety of symlinks is universally held.

bcmills avatar Oct 26 '22 16:10 bcmills

“Lexically in the same directory” is a clear property for which we can surely promise backward-compatibility.

Alas, this is also not the right property. "COM1" is lexically in the current directory, but it is not safe on Windows.

neild avatar Oct 26 '22 17:10 neild