skipper
skipper copied to clipboard
wildcard hostname match not possible
There are some issues specific to Kubernetes:
- ingress does not allow this setting, but we can weaken RouteGroup to support it
- You can't specify
"*.example.org"
because of the regexp match that enforces[a-z0-9].
in CRD -
"*.example.org"
would need special handling, because it should be treated asHost(/[a-z0-9]+[.]example[.]org/)
The matching would need to be done carefully such that an ingress with www.example.org
is not caught by a RouteGroup with *.example.org
An example how we could use Weight()
:
./bin/skipper -inline-routes='r0: Host(/www[.]example[.]org/) -> status(200) -> <shunt>; rstar: Host(/[a-z0-9]+[.]example[.]org/) && Weight(0.1) -> status(418) -> <shunt>;'
[APP]INFO[0000] Expose metrics in codahale format
[APP]INFO[0000] support listener on :9911
[APP]INFO[0000] proxy listener on :9090
[APP]INFO[0000] TLS settings not found, defaulting to HTTP
[APP]INFO[0000] route settings, reset, route: r0: Host(/www[.]example[.]org/) -> status(200) -> <shunt>
[APP]INFO[0000] route settings, reset, route: rstar: Host(/[a-z0-9]+[.]example[.]org/) && Weight(0.1) -> status(418) -> <shunt>
[APP]INFO[0000] route settings received
[APP]INFO[0000] route settings applied
127.0.0.1 - - [07/Dec/2020:17:26:57 +0100] "GET / HTTP/1.1" 200 0 "-" "curl/7.49.0" 0 www.example.org - -
127.0.0.1 - - [07/Dec/2020:17:27:02 +0100] "GET / HTTP/1.1" 418 0 "-" "curl/7.49.0" 0 foo.example.org - -
127.0.0.1 - - [07/Dec/2020:17:27:04 +0100] "GET / HTTP/1.1" 200 0 "-" "curl/7.49.0" 0 www.example.org - -
127.0.0.1 - - [07/Dec/2020:17:27:07 +0100] "GET / HTTP/1.1" 418 0 "-" "curl/7.49.0" 0 foo.example.org - -
Right now the dataclient kubernetes would create errors while creating a route for host with *.example.org
:
[APP]time="2020-12-07T16:32:30Z" level=info msg="route settings, update, route: kube_rg__default__my_route_group_1__all__2_0: PathSubtree(\"/\") && Host(\"^(*[.]example[.]org|my-rg1[.]example[.]org && Weight(0.1) -> status(418) -> inlineContent(\"I'm a teapot\") -> <shunt>"
[APP]time="2020-12-07T16:32:30Z" level=info msg="route settings, update, route: kube_rg______example_org__catchall__0_0: Host(\"^(*[.]example[.]org)$\") -> <shunt>"
[APP]time="2020-12-07T16:32:30Z" level=info msg="route settings received"
[APP]time="2020-12-07T16:32:30Z" level=info msg="filterRoutes incoming=148 outgoing=148"
[APP]time="2020-12-07T16:32:30Z" level=error msg="kube_rg______example_org__catchall__0_0 [22]: error parsing regexp: missing argument to repetition operator: `*`"
[APP]time="2020-12-07T16:32:30Z" level=error msg="kube_rg__default__my_route_group_1__all__1_0_https_redirect [35]: error parsing regexp: missing argument to repetition operator: `*`"
...```
@NorthFury @aryszka can this be considered safe to reduce weight by adding Weight(0.1)
?
We need to change for this what we allow as the hostname. OTOH, I would try to avoid the solution based on the Weight, because it can lead to unforeseen long term challenges.
Would an approach like the following help here?
www: Host("www[.]example[.]org") -> "docs1";
others: Host("(foo|bar|baz)[.]example[.]org") -> "docs2";
This would limit the matched hostnames to fixed set, but I believe that should be managaeble, unless we face a situation where dozens/hunderds of hostnames, or an unknown set of hostnames need to be handled. In that case the following trick can work:
specialCase: Host(".*[.]example[.]org") && Host("www[.]example[.]org") -> "docs1";
genericCase: Host(".*[.]example[.]org") -> "docs2";
Note that we don't use the Weight() predicate. The additional Host() header for the specialCase
can be added as a predicate in the route group.
This Host("(foo|bar|baz)[.]example[.]org")
does not work if the owner of the wildcard domain does not know the hostnames in advance.
What I wonder was that when I did not had the Weight() I IIRC got 404, now it just work without Weight().
% ./bin/skipper -inline-routes='r0: Host(/www[.]example[.]org/) -> status(200) -> <shunt>; rstar: Host(/[a-z0-9]+[.]example[.]org/) -> status(418) -> <shunt>;'
[APP]INFO[0000] Expose metrics in codahale format
[APP]INFO[0000] support listener on :9911
[APP]INFO[0000] proxy listener on :9090
[APP]INFO[0000] TLS settings not found, defaulting to HTTP
[APP]INFO[0000] route settings, reset, route: r0: Host(/www[.]example[.]org/) -> status(200) -> <shunt>
[APP]INFO[0000] route settings, reset, route: rstar: Host(/[a-z0-9]+[.]example[.]org/) -> status(418) -> <shunt>
[APP]INFO[0000] route settings received
[APP]INFO[0000] route settings applied
127.0.0.1 - - [08/Dec/2020:15:31:36 +0100] "GET / HTTP/1.1" 418 0 "-" "curl/7.49.0" 0 foo.example.org - -
127.0.0.1 - - [08/Dec/2020:15:31:46 +0100] "GET / HTTP/1.1" 200 0 "-" "curl/7.49.0" 0 www.example.org - -
127.0.0.1 - - [08/Dec/2020:15:31:48 +0100] "GET / HTTP/1.1" 200 0 "-" "curl/7.49.0" 0 www.example.org - -
127.0.0.1 - - [08/Dec/2020:15:31:48 +0100] "GET / HTTP/1.1" 200 0 "-" "curl/7.49.0" 0 www.example.org - -
127.0.0.1 - - [08/Dec/2020:15:31:51 +0100] "GET / HTTP/1.1" 418 0 "-" "curl/7.49.0" 0 foo.example.org - -
127.0.0.1 - - [08/Dec/2020:15:31:52 +0100] "GET / HTTP/1.1" 418 0 "-" "curl/7.49.0" 0 foo.example.org - -
^C
for that case, when the owner doesn't know the hostnames in advance, I recommend the second example described above. Of course it still needs changes to what we allow in the route groups.
The 404
might have been just a mistake resulting in an invalid route, I cannot imagine a different case.
..., now it just works without Weight().
Careful! It's still an undefined behavior, if you restart skipper a couple of times, you will probably have different results, receiving only 418s.
for that case, when the owner doesn't know the hostnames in advance, I recommend the second example described above. Of course it still needs changes to what we allow in the route groups.
I don't think that approach is a great one. :)
The
404
might have been just a mistake resulting in an invalid route, I cannot imagine a different case.
I consider this as, I was doing something wrong. I could have tricked myself.
..., now it just works without Weight().
Careful! It's still an undefined behavior, if you restart skipper a couple of times, you will probably have different results, receiving only 418s.
I think we should consider this as bug and we should fix it long term (v1 ?) with a routing tree change. Maybe the internal proposed "hostname before path match" should be also considered. So a basically 2 tree lookups (or one tree with both), first host, second path to reduce the predicate list to match.
There is this option of having high priority predicates. Following through with this idea in the current code setup, one could have a HostStrict
predicate that needs to be an exact match. This would need to behave similarly to how the headersExact
matcher works. Basically this can add a hostExact
property to the leafMatcher
and in the matchLeaf
function it would have to be checked before matching hostRxs
.
Notes
- names are just suggestions. the
hostExact
property suggestion is to match the existingheadersExact
property in theleafMatcher
. - as far as the new predicate name goes I don't have a strong opinion except that it's not similar to the
Path
andPathRegexp
predicates and it can't be without a breaking change.
There is this option of having high priority predicates.
If it is about https://github.com/zalando/skipper/blob/4dcbf4bfffaf6d8f31bc7397dbd05181f4ff6857/proxy/proxy.go#L814-L823 that already seemed adhoc to me
The problem of wildcard domain match feels similar to wildcard path matching.
The problem of wildcard domain match feels similar to wildcard path matching.
I would argue that wildcard domain matching is much more limited in scope. I would choose not to increase the complexity because of it, at least not without having good use cases for it.
And to explain my suggestion, there is already a list of "high priority" predicates, although not marked as such. In fact, the current Host
predicate is part of that "high priority" predicates list. Here is where you can find them: https://github.com/zalando/skipper/blob/aa497f6330716505e98cfb1dd5175aed7c07c846/routing/matcher.go#L31-L43
And here is where those are used: https://github.com/zalando/skipper/blob/aa497f6330716505e98cfb1dd5175aed7c07c846/routing/matcher.go#L447-L473
@szuecs While wildcards are fine, we could solve the problem also more efficiently with a host-suffix support instead of wildcards.
@tkrop the normal way of suffix match won’t be possible for DNS (only one wildcard until the dot is possible) and I am not yet talking about implementation details only about what to achieve. We could perfectly do in skipper a suffix search and enforce via jsonSchema.
I think we should do the work around mentioned in https://github.com/zalando/skipper/issues/1637#issuecomment-740649193 to add in all cases the Host(".*[.]example[.]org")
predicate to all routes with Host()
to ensure we have 2x Host() to have the priority for non wildcard definitions.
What we need to clarify, too is:
- [ ] how does kube-ingress-aws-controller works in this case https://github.com/zalando-incubator/kube-ingress-aws-controller/issues/393
- [ ] Feature request external-dns https://github.com/kubernetes-sigs/external-dns/issues/1927
@szuecs
can this be considered safe to reduce weight by adding Weight(0.1)
I think it should be other way around - use Weight
(or True
) to give preference to exact match. Note that Weight
requires integer argument we just do not validate it strict enough and round 0.1 to 0:
https://github.com/zalando/skipper/blob/1e26fcd250309d05da5a7b086c784ba100cf07da/routing/datasource.go#L341-L355
E.g. (note that correct wildcard regexp should be more complex https://stackoverflow.com/questions/106179/regular-expression-to-match-dns-hostname-or-ip-address):
./bin/skipper -inline-routes='www: Host(/www[.]example[.]org/) && Weight(1) -> inlineContent("www") -> <shunt>; wildcard: Host(/[a-z-0-9]+[.]example[.]org/) -> inlineContent("wildcard") -> <shunt>;'
@AlexanderYastrebov I would rather use True()
, but maybe we can also change the weight in the routing package.
Maybe @aryszka has a suggestion.
Related #284
Another idea is to allow RouteGroup
without hosts
and let users use Host(/[a-z0-9]+[.]example[.]org/)
predicate:
apiVersion: zalando.org/v1
kind: RouteGroup
metadata:
name: my-api
spec:
hosts: []
backends:
- name: my-api-http
serviceName: my-api-service
servicePort: 8080
type: service
routes:
- pathSubtree: /api
predicates:
- Host(/[a-z0-9]+[.]example[.]org/)
backends:
- backendName: my-api-service
The DNS entries for the hosts then might be created either via external-dns CRD or via another auxiliary RouteGroup
that enumerates all allowed hosts:
apiVersion: zalando.org/v1
kind: RouteGroup
metadata:
name: my-api-hosts
spec:
hosts:
- foo.example.org
- bar.example.org
- baz.example.org
backends:
- name: shunt
type: shunt
routes:
- path: /whatever
predicates:
- False()
backends:
- backendName: shunt
Another usecase for the host-less RouteGroup
could be definition of cluster-wide catch-all routes without Host
predicate.
We could also create a spec.hostRegexp that is optional and one of spec.hosts and spec.hostRegexp should be defined