caddy
caddy copied to clipboard
`path_regexp` matching whitespace breaks `uri strip_prefix`
It would seem that when matching a request path with path_regexp
, if the resulting match includes whitespace, using the match as a part of a following uri strip_prefix
causes the stripping to break.
Example Caddyfile:
{
debug
}
localhost:80 {
@jpegmatch {
path /api/static/*
path_regexp sample_params /api/static/([^/]+)/.*.(jpg|jpeg|png)$
}
handle @jpegmatch {
uri strip_prefix /api/static/{http.regexp.sample_params.1}
root * /app/data/{http.regexp.sample_params.1}/images/
file_server
}
}
Excerpt from debug log output:
{
"level":"debug",
"ts":1653058747.7992723,
"logger":"http.handlers.file_server",
"msg":"sanitized path join",
"site_root":"/app/data/1517 (2)/images/",
"request_path":"/api/static/1517 (2)/icon_da4e50dd-f069-4fcd-9088-7fbb08711a04.jpg",
"result":"/app/data/1517 (2)/images/api/static/1517 (2)/icon_da4e50dd-f069-4fcd-9088-7fbb08711a04.jpg"
}
If I use the following handle @jpegmatch
instead, everything works as expected:
handle @jpegmatch {
uri path_regexp /api/static/([^/]+)/ /
root * /app/data/{http.regexp.sample_params.1}/images/
file_server
}
Log excerpt from above handle @jpegmatch
:
{
"level":"debug",
"ts":1653058951.8403285,
"logger":"http.handlers.file_server",
"msg":"sanitized path join",
"site_root":"/app/data/1517 (2)/images/",
"request_path":"/icon_da4e50dd-f069-4fcd-9088-7fbb08711a04.jpg",
"result":"/app/data/1517 (2)/images/icon_da4e50dd-f069-4fcd-9088-7fbb08711a04.jpg"
}
The problem is that prefix stripping works on the raw path, which looks like /api/static/1517%20(2)/
. The browser URL encodes the space as it sends the request.
The path_regexp
matcher though, works on the decoded path, because we clean the path before passing it to the regexp.
So it tries to strip /api/static/1517 (2)
from /api/static/1517%20(2)/icon_da4e50dd-f069-4fcd-9088-7fbb08711a04.jpg
but that doesn't match exactly.
I'm not sure what to suggest here. I'm not sure if it can be called a bug or intended. TBD.
I see :thinking: Are there other features that depend on the difference of handling raw or decoded paths? I think it would feel more natural for as many features as possible to operate on decoded paths, but if it would break some other features I'm not aware of then the tradeoff might not be worth it... This behaviour is quite non-trivial to figure out though and I couldn't find anything in the documentation that would clarify it, so maybe at least some statement in strip_prefix
or path_regexp
docs could help?
@Jonesus I'm currently working on making this more consistent across Caddy. Follow progress in #4948.
@Jonesus I have pushed a fix for this issue in #4948. I will write up a full explanation of all the changes in that PR soon. But in the meantime please feel free to try it out and confirm it works for you too! (I used your config to test things as I went, and got it working.)