caddy icon indicating copy to clipboard operation
caddy copied to clipboard

`path_regexp` matching whitespace breaks `uri strip_prefix`

Open Jonesus opened this issue 2 years ago • 3 comments

It would seem that when matching a request path with path_regexp, if the resulting match includes whitespace, using the match as a part of a following uri strip_prefix causes the stripping to break.

Example Caddyfile:

{
    debug
}

localhost:80 {
    @jpegmatch {
        path /api/static/*
        path_regexp sample_params /api/static/([^/]+)/.*.(jpg|jpeg|png)$
    }

    handle @jpegmatch {
        uri strip_prefix /api/static/{http.regexp.sample_params.1}
        root * /app/data/{http.regexp.sample_params.1}/images/
        file_server
    }
}

Excerpt from debug log output:

{
    "level":"debug",
    "ts":1653058747.7992723,
    "logger":"http.handlers.file_server",
    "msg":"sanitized path join",
    "site_root":"/app/data/1517 (2)/images/",
    "request_path":"/api/static/1517 (2)/icon_da4e50dd-f069-4fcd-9088-7fbb08711a04.jpg",
    "result":"/app/data/1517 (2)/images/api/static/1517 (2)/icon_da4e50dd-f069-4fcd-9088-7fbb08711a04.jpg"
}

If I use the following handle @jpegmatch instead, everything works as expected:

    handle @jpegmatch {
        uri path_regexp /api/static/([^/]+)/ /
        root * /app/data/{http.regexp.sample_params.1}/images/
        file_server
    }

Log excerpt from above handle @jpegmatch:

{
    "level":"debug",
    "ts":1653058951.8403285,
    "logger":"http.handlers.file_server",
    "msg":"sanitized path join",
    "site_root":"/app/data/1517 (2)/images/",
    "request_path":"/icon_da4e50dd-f069-4fcd-9088-7fbb08711a04.jpg",
    "result":"/app/data/1517 (2)/images/icon_da4e50dd-f069-4fcd-9088-7fbb08711a04.jpg"
}

Jonesus avatar May 20 '22 15:05 Jonesus

The problem is that prefix stripping works on the raw path, which looks like /api/static/1517%20(2)/. The browser URL encodes the space as it sends the request.

The path_regexp matcher though, works on the decoded path, because we clean the path before passing it to the regexp.

So it tries to strip /api/static/1517 (2) from /api/static/1517%20(2)/icon_da4e50dd-f069-4fcd-9088-7fbb08711a04.jpg but that doesn't match exactly.

I'm not sure what to suggest here. I'm not sure if it can be called a bug or intended. TBD.

francislavoie avatar May 20 '22 17:05 francislavoie

I see :thinking: Are there other features that depend on the difference of handling raw or decoded paths? I think it would feel more natural for as many features as possible to operate on decoded paths, but if it would break some other features I'm not aware of then the tradeoff might not be worth it... This behaviour is quite non-trivial to figure out though and I couldn't find anything in the documentation that would clarify it, so maybe at least some statement in strip_prefix or path_regexp docs could help?

Jonesus avatar May 21 '22 08:05 Jonesus

@Jonesus I'm currently working on making this more consistent across Caddy. Follow progress in #4948.

mholt avatar Aug 10 '22 05:08 mholt

@Jonesus I have pushed a fix for this issue in #4948. I will write up a full explanation of all the changes in that PR soon. But in the meantime please feel free to try it out and confirm it works for you too! (I used your config to test things as I went, and got it working.)

mholt avatar Aug 12 '22 05:08 mholt