bottle
bottle copied to clipboard
Router unable to match wildcard filter in the middle of a URL
I'm trying to use the :path wildcard filter (as described here) to match part of my URL, which includes a forward slash character. For example, if I have the URL:
/resources/adfs89s7/container/asdf%2Fasdf/items
(where %2F is the forward slash), I want to match it to the route:
/resources/<resource_id>/container/<container_name:path>/items
However, this is currently returning a Not Found error. I have similar URLs where the wildcard filter is at the end of the URL, e.g.
/resources/<resource_id>/container/<container_name:path>
and that seems to work fine.
The two strings %2F
and /
are equivalent in an URI path. You can encode any character this way. a
equals %61
for example.
@claire-lee What you are trying is probably better suited for regexes, because, afaik, :path
wildcard will consume the rest of the path including escaped and unescaped slashes.
Oh, the :path
filter works just fine. I cannot reproduce the error.
>>> import bottle
>>> app = bottle.Bottle()
>>> app.route('/resources/<resource_id>/container/<container_name:path>/items', callback=True)
>>> app.match(dict(PATH_INFO='/resources/adfs89s7/container/asdf%2Fasdf/items', REQUEST_METHOD='GET'))
(..., {'resource_id': 'adfs89s7', 'container_name': 'asdf%2Fasdf'})
>>> app.match(dict(PATH_INFO='/resources/adfs89s7/container/asdf/asdf/items', REQUEST_METHOD='GET'))
(..., {'resource_id': 'adfs89s7', 'container_name': 'asdf/asdf'})
The follow will 404 when using %2F, but works fine with a literal forward slash
http://something.com/test/123/asd
-> 123/asd
http://something.com/test/123%2Fasd
-> 404
@route("/test/<test:re:.+>", method='GET')
def test(test):
return test
I still cannot reproduce this bug in master or release-v12:
>>> import bottle
>>> app = bottle.Bottle()
>>> app.route('/test/<test:re:.+>', callback=True)
True
>>> app.match(dict(PATH_INFO='/test/123/asd', REQUEST_METHOD='GET'))
(<GET '/test/<test:re:.+>' True>, {'test': '123/asd'})
>>> app.match(dict(PATH_INFO='/test/123%2Fasd', REQUEST_METHOD='GET'))
(<GET '/test/<test:re:.+>' True>, {'test': '123%2Fasd'})
I tried the exact script you posted (plus import statements and a run() at the end). It works as intended:
$ curl http://127.0.0.1:8080/test/123/asd
123/asd
$ curl http://127.0.0.1:8080/test/123%2Fasd
123/asd
😕
I'm facing the same issue. As an explanation, URI encoding the slash should have the effect that it is not interpreted as a hierarchical component in contrast to a literal forward slash. The point of encoding is to remove the semantic meaning. That is I would expect that:
-
/test/123%2Fasd
refers to the document123%2Fasd
located attest
. As such, it should not match the route/test/123/:doc
. -
/test/123/asd
refers to the documentasd
located at/test/123
. As such, it should match the route/test/123/:doc
.
However given this service:
def print_all(*args, **kwargs): print(*args, kwargs)
import bottle
app = bottle.Bottle()
app.get("/test/123/:doc")(print_all)
app.run()
curl -o http://localhost:8080/test/123%2Fasd
# Server log: {'doc': 'asd'}, should return a 404
curl -o http://localhost:8080/test/123/asd
# Server log: {'doc': 'asd'}
It seems that the URI-encoding is already removed prior to the call to match
hence those are both matched as /test/123/asd
. This would explain all observations:
-
match
androute
when called with encoded route information work as intended and do not match the still encoded/%2F
. - a running app will never really call those methods with the anticipated parameters.
How this can be resolved is a difficult question. It certainly makes some sense to do URL decoding after having isolated the pure path component as this makes it far more ergonomic to match paths with special characters such as spaces or question marks. And indeed matching /test/doc?cheeky
would match a document named doc?cheeky
(encoded as /test/doc%3Fcheeky
) and not a get with a query. The handler should most definitely get the URI decoded path components as well. However this scheme means that /
is always semantically interpreted even if it shouldn't and there is no way to escape it for the client.