caddy icon indicating copy to clipboard operation
caddy copied to clipboard

Problem with path matching and URI manipulation when involving double slashes

Open RussellLuo opened this issue 3 years ago • 4 comments

Problem

1. Path matcher containing double slashes won't match any request

Caddyfile:

:8080 {
    handle_path //prefix/* {
        reverse_proxy 127.0.0.1:9090
    }
    respond "oops"
}

Neither of the following requests will match handle_path:

$ curl http://127.0.0.1:8080/prefix/path
oops
$ curl http://127.0.0.1:8080//prefix/path
oops

2. Normal handle_path can match but can't strip prefix from requests containing double slashes

Caddyfile:

:8080 {
    handle_path /prefix/* {
        reverse_proxy 127.0.0.1:9090
    }
    respond "oops"
}

:9090 {
    respond {path}
}

As shown below, the second request containing double slashes will be forwarded to the upstream without being stripped:

$ curl http://127.0.0.1:8080/prefix/path 
/path
$ curl http://127.0.0.1:8080//prefix/path                                                     
//prefix/path

Cause

I think the above problems are explainable, since "path cleaning" was introduced after the fix #4407:

The solution is to unescape (convert URL-encoded characters) then clean the path, before matching.

Possible solution

If the problems mentioned above are not the normal behaviors as expected by Caddy, I think maybe we should also introduce "path cleaning" while doing path matching (besides cleaning the source path, we should also clean the target path) and URI manipulation.

RussellLuo avatar Apr 28 '22 08:04 RussellLuo

As for problem (2), I have tested the behavior of Nginx with the following config:

server {
    listen 8080;

    location /prefix/ {
        proxy_pass http://127.0.0.1:9090/; # The trailing slash matters.
    } 
}

And here are the corresponding error logs (since no upstream is defined):

..., request: "GET /prefix/path HTTP/1.1", upstream: "http://127.0.0.1:9090/path", host: "127.0.0.1:8080"
..., request: "GET //prefix/path HTTP/1.1", upstream: "http://127.0.0.1:9090/path", host: "127.0.0.1:8080"

RussellLuo avatar Apr 29 '22 01:04 RussellLuo

Instead of cleaning path implicitly within path matchers, how about enhancing the rewrite handler to let users do it explicitly if needed?

For example, as a possible solution, we can add an attribute NormalizePath to Rewrite as below:

type Rewrite struct {
	...

	// Strips the given prefix from the beginning of the URI path.
	StripPathPrefix string `json:"strip_path_prefix,omitempty"`

	// Strips the given suffix from the end of the URI path.
	StripPathSuffix string `json:"strip_path_suffix,omitempty"`

	// Performs substring replacements on the URI.
	URISubstring []substrReplacer `json:"uri_substring,omitempty"`

	// Performs regular expression replacements on the URI path.
	PathRegexp []*regexReplacer `json:"path_regexp,omitempty"`

+	// Normalize the URI path by decoding the text encoded in the “%XX” form,
+	// resolving references to relative path components “.” and “..”, and
+	// merging multiple slashes into a single slash.
+	NormalizePath bool `json:"normalize_path,omitempty"`

	logger *zap.Logger
}

Take the problem (2) for example, the corresponding Caddyfile might look like this:

:8080 {
    uri normalize_path
    handle_path /prefix/* {
        reverse_proxy 127.0.0.1:9090
    }
    respond "oops"
}

:9090 {
    respond {path}
}

As for the original problem with basicauth - the case where "path cleaning" was introduced - I think this may be an alternative fix:

example.com {
    root * /srv

    uri normalize_path
    basicauth /secret-files* {
        bob <hash>
    }   

    file_server
}

RussellLuo avatar May 03 '22 10:05 RussellLuo

Thanks Russell -- as discussed in Slack, I wonder if a good solution for this would be an exported function in the caddyhttp package, that matchers and handlers can use as needed.

Edit: actually, maybe SanitizedPathJoin is related or the function we're looking for?

mholt avatar May 05 '22 18:05 mholt

@RussellLuo I've been thinking on this long and hard for a while. Track #4948 where I am hopefully resolving this.

mholt avatar Aug 10 '22 05:08 mholt

@RussellLuo This is now fixed in #4948. If the prefix (or suffix, or substring) pattern in the rewrite config contains //, then it will be assumed that slashes should not be merged when cleaning the path.

mholt avatar Aug 11 '22 20:08 mholt