url matching order changed versus 2.2
(Sorry for the convoluted title: I don't know the right vocabulary for this.)
I noticed when upgrading to Werkzeug 2.2+, two of my routing rules have swapped priority.
Sample application
Here's a simple werkzeug application. The important part is the 2 rules:
Rule('/<path:filename>', endpoint=self.on_static, subdomain="static"),
Rule('/healthcheck', endpoint=self.on_healthcheck, subdomain="<subdomain>"),
cat server.py
from werkzeug.wrappers import Request, Response
from werkzeug.routing import Map, Rule
from werkzeug.exceptions import HTTPException
class SimpleServer:
def __init__(self, server_name: str, url_map: Map):
self._server_name = server_name
self._url_map = url_map
def dispatch_request(self, request):
adapter = self._url_map.bind_to_environ(request.environ, server_name=self._server_name)
try:
endpoint, values = adapter.match()
return endpoint(request, **values)
except HTTPException as e:
return e
def wsgi_app(self, environ, start_response):
request = Request(environ)
response = self.dispatch_request(request)
return response(environ, start_response)
def __call__(self, environ, start_response):
return self.wsgi_app(environ, start_response)
class Demo(SimpleServer):
def __init__(self, server_name: str):
url_map = Map([
Rule('/<path:filename>', endpoint=self.on_static, subdomain="static"),
Rule('/healthcheck', endpoint=self.on_healthcheck, subdomain="<subdomain>"),
])
super().__init__(server_name, url_map)
def on_static(self, _request, filename):
return Response(f'on_static: {filename=}')
def on_healthcheck(self, _request, subdomain):
return Response(f'on_healthcheck {subdomain=}')
if __name__ == '__main__':
from werkzeug.serving import run_simple
port = 8080
app = Demo(server_name=f"example.com:{port}")
run_simple('127.0.0.1', port, app, use_debugger=True, use_reloader=True)
Before (on werkzeug <2.2)
Run the server:
pip install werkzeug==2.1.2 && python server.py
Note how /healthcheck behaves the same on both the "static" subdomain, and a different subdomain "foo":
"static" subdomain:
$ curl http://static.example.com:8080/healthcheck --resolve '*:8080:127.0.0.1'
on_healthcheck: subdomain='static'
"foo" subdomain:
$ curl http://foo.example.com:8080/healthcheck --resolve '*:8080:127.0.0.1'
on_healthcheck: subdomain='foo'
After (werkzeug 2.2+)
pip install werkzeug==2.2.0 && python server.py
Note how /healthcheck now behaves differently on the two subdomains. On "static", we now get back the on_static endpoint, and on "foo" we still get back the on_healthcheck endpoint.
"static" subdomain:
$ curl http://static.example.com:8080/healthcheck --resolve '*:8080:127.0.0.1'
on_static: filename='healthcheck'
"foo" subdomain:
$ curl http://foo.example.com:8080/healthcheck --resolve '*:8080:127.0.0.1'
on_healthcheck: subdomain='foo'
I see the same behavior with the latest version of werkzeug (3.0.3 at time of writing).
Summary
Is this change in behavior intentional? The PR https://github.com/pallets/werkzeug/pull/2433 just describes this as a faster matcher, it doesn't say anything about a change in behavior.
Is there some way of configuring a route across all subdomains that takes precedence over the subdomain specific /<path:filename> rule in my example?
Workaround
I don't have a great workaround for this. I can get close to the pre-werkzeug 2.2 behavior by adding a 3rd rule specifically for the "static" subdomain:
Rule('/<path:filename>', endpoint=self.on_static, subdomain="static"),
Rule('/healthcheck', endpoint=self.on_healthcheck, subdomain="<subdomain>"),
+Rule('/healthcheck', endpoint=self.on_healthcheck, subdomain="static"),
But this behaves a bit differently: there's no subdomain argument passed to my endpoint handler.
Environment:
- Python version: 3.12.4
- Werkzeug version: multiple, see description above
@pgjones, since you wrote the new matcher
@pgjones can you check this when you get a chance?
99% This is a bug in the new matcher.
https://github.com/pallets/werkzeug/blob/7868bef5d978093a8baa0784464ebe5d775ae92a/src/werkzeug/routing/matcher.py#L128
target = "/".join(parts) should be target = "/" + "/".join(parts) based on what the rules look like that should match.
Making the change fixes a similar issue on my end and all tests pass, which indicates that there is a missing test.
edit: this is not the issue, false positive related to the structure of the routes I was testing
Having dug deeper into the issue, I think it comes from a conflict in how RulePart weights are calculated. Currently RulePart Weighting uses -len(argument_weights) I think probably as an attempt at a quick fix for greedy matching of paths. This was likely to satisfy test_greedy in test_routing.py.
However, test_greedy is fundamentally wrong in its current form because it tries to force matching of two path variables back to back /<path:bar>/<path:blub>. There is no way to specify where one starts and the other ends. If the desire on the part of the user is that there be a single element at the end that is not part of <path/bar> then the correct way to express that is as /<path:bar>/<blub>, this is also one of the only two reasonable interpretations of back to back paths, the other being /<bar>/<path:blub>.
The simplest fix is to switch to use len(argument_weights), inverting the ordering. However there are two other tests that fail when that change is made, so I am investigating what is going on.
Here is an example of the missing test.
def test_static_priority():
map = r.Map(
[
r.Rule("/<path:dyn2>/<dyn1>", endpoint="file"),
r.Rule("/<dyn1>/statn", endpoint="stat"),
],
adapter = map.bind("example.org", "/")
assert adapter.match("/d2/d1", method="GET") == ('file', {'dyn2': 'd2', 'dyn1': 'd1'})
assert adapter.match("/d1/statn", method="GET") == ('stat', {'dyn1': 'd1'})
I don't think this is clear in Werkzeug < 2.2 as if you change your path to /bar from /healthcheck you will get the opposite result. This is due to the match key depending on the path length. It is deterministic in >= 2.2 in that /healthcheck will never match even if you change it to /bar. I'm not sure if this is a bug.
(I also don't think #3018 fixes this as a static part will always match in preference to a dynamic part.)
@pgjones, did you see this question in the OP?
Is there some way of configuring a route across all subdomains that takes precedence over the subdomain specific /path:filename rule in my example?
Your solution of adding a specific endpoint is correct, but you need to add a default for subdomain since it's not a variable in that rule:
Rule(
'/healthcheck',
endpoint=self.on_healthcheck,
subdomain="static",
defaults: {"subdomain": "static"}
)
@pgjones I can't remember how rules are weighted/sorted. I'm inclined to agree with your analysis, but I think we should at least document this clearly. I'm pretty sure all the following contribute, and are stable, but it's not documented anywhere:
- The subdomain part is counted the same as path parts.
- Most converters have the same weight but a few do not.
- The number of static and dynamic parts is taken into account.
- Etc.
In the following example, I think both rules have two parts, one static and one dynamic each, and the path converter has more weight. But it's not intuitive to know what matching order this results in, or how to know that a rule would be more specific and override a more general rule as shown in the solution above.
from werkzeug.routing import Map, Rule
url_map = Map([
Rule("/<path:filename>", endpoint="static", subdomain="static"),
Rule("/healthcheck", endpoint="healthcheck", subdomain="<subdomain>"),
])
adapter = url_map.bind("localhost", subdomain="static", path_info="/healthcheck")
print(adapter.match())
I think main thing to note is that the subdomain is checked before path parts and that static subdomain matches take precedence over dynamic matches. (Static path parts also take precedence over dynamic path parts).