skipper icon indicating copy to clipboard operation
skipper copied to clipboard

wildcard hostname match not possible

Open szuecs opened this issue 4 years ago • 16 comments

There are some issues specific to Kubernetes:

  1. ingress does not allow this setting, but we can weaken RouteGroup to support it
  2. You can't specify "*.example.org" because of the regexp match that enforces [a-z0-9]. in CRD
  3. "*.example.org" would need special handling, because it should be treated as Host(/[a-z0-9]+[.]example[.]org/)

The matching would need to be done carefully such that an ingress with www.example.org is not caught by a RouteGroup with *.example.org

An example how we could use Weight():

 ./bin/skipper -inline-routes='r0: Host(/www[.]example[.]org/) -> status(200) -> <shunt>; rstar: Host(/[a-z0-9]+[.]example[.]org/) && Weight(0.1) -> status(418) -> <shunt>;'
[APP]INFO[0000] Expose metrics in codahale format
[APP]INFO[0000] support listener on :9911
[APP]INFO[0000] proxy listener on :9090
[APP]INFO[0000] TLS settings not found, defaulting to HTTP
[APP]INFO[0000] route settings, reset, route: r0: Host(/www[.]example[.]org/) -> status(200) -> <shunt>
[APP]INFO[0000] route settings, reset, route: rstar: Host(/[a-z0-9]+[.]example[.]org/) && Weight(0.1) -> status(418) -> <shunt>
[APP]INFO[0000] route settings received
[APP]INFO[0000] route settings applied
127.0.0.1 - - [07/Dec/2020:17:26:57 +0100] "GET / HTTP/1.1" 200 0 "-" "curl/7.49.0" 0 www.example.org - -
127.0.0.1 - - [07/Dec/2020:17:27:02 +0100] "GET / HTTP/1.1" 418 0 "-" "curl/7.49.0" 0 foo.example.org - -
127.0.0.1 - - [07/Dec/2020:17:27:04 +0100] "GET / HTTP/1.1" 200 0 "-" "curl/7.49.0" 0 www.example.org - -
127.0.0.1 - - [07/Dec/2020:17:27:07 +0100] "GET / HTTP/1.1" 418 0 "-" "curl/7.49.0" 0 foo.example.org - -

Right now the dataclient kubernetes would create errors while creating a route for host with *.example.org:

[APP]time="2020-12-07T16:32:30Z" level=info msg="route settings, update, route: kube_rg__default__my_route_group_1__all__2_0: PathSubtree(\"/\") && Host(\"^(*[.]example[.]org|my-rg1[.]example[.]org && Weight(0.1) -> status(418) -> inlineContent(\"I'm a teapot\") -> <shunt>"
[APP]time="2020-12-07T16:32:30Z" level=info msg="route settings, update, route: kube_rg______example_org__catchall__0_0: Host(\"^(*[.]example[.]org)$\") -> <shunt>"
[APP]time="2020-12-07T16:32:30Z" level=info msg="route settings received"
[APP]time="2020-12-07T16:32:30Z" level=info msg="filterRoutes incoming=148 outgoing=148"
[APP]time="2020-12-07T16:32:30Z" level=error msg="kube_rg______example_org__catchall__0_0 [22]: error parsing regexp: missing argument to repetition operator: `*`"
[APP]time="2020-12-07T16:32:30Z" level=error msg="kube_rg__default__my_route_group_1__all__1_0_https_redirect [35]: error parsing regexp: missing argument to repetition operator: `*`"
...```

szuecs avatar Dec 07 '20 15:12 szuecs

@NorthFury @aryszka can this be considered safe to reduce weight by adding Weight(0.1)?

szuecs avatar Dec 07 '20 16:12 szuecs

We need to change for this what we allow as the hostname. OTOH, I would try to avoid the solution based on the Weight, because it can lead to unforeseen long term challenges.

Would an approach like the following help here?

www: Host("www[.]example[.]org") -> "docs1";
others: Host("(foo|bar|baz)[.]example[.]org") -> "docs2";

This would limit the matched hostnames to fixed set, but I believe that should be managaeble, unless we face a situation where dozens/hunderds of hostnames, or an unknown set of hostnames need to be handled. In that case the following trick can work:

specialCase: Host(".*[.]example[.]org") && Host("www[.]example[.]org") -> "docs1";
genericCase: Host(".*[.]example[.]org") -> "docs2";

Note that we don't use the Weight() predicate. The additional Host() header for the specialCase can be added as a predicate in the route group.

aryszka avatar Dec 08 '20 14:12 aryszka

This Host("(foo|bar|baz)[.]example[.]org") does not work if the owner of the wildcard domain does not know the hostnames in advance.

What I wonder was that when I did not had the Weight() I IIRC got 404, now it just work without Weight().

% ./bin/skipper -inline-routes='r0: Host(/www[.]example[.]org/) -> status(200) -> <shunt>; rstar: Host(/[a-z0-9]+[.]example[.]org/) -> status(418) -> <shunt>;'
[APP]INFO[0000] Expose metrics in codahale format
[APP]INFO[0000] support listener on :9911
[APP]INFO[0000] proxy listener on :9090
[APP]INFO[0000] TLS settings not found, defaulting to HTTP
[APP]INFO[0000] route settings, reset, route: r0: Host(/www[.]example[.]org/) -> status(200) -> <shunt>
[APP]INFO[0000] route settings, reset, route: rstar: Host(/[a-z0-9]+[.]example[.]org/) -> status(418) -> <shunt>
[APP]INFO[0000] route settings received
[APP]INFO[0000] route settings applied
127.0.0.1 - - [08/Dec/2020:15:31:36 +0100] "GET / HTTP/1.1" 418 0 "-" "curl/7.49.0" 0 foo.example.org - -
127.0.0.1 - - [08/Dec/2020:15:31:46 +0100] "GET / HTTP/1.1" 200 0 "-" "curl/7.49.0" 0 www.example.org - -
127.0.0.1 - - [08/Dec/2020:15:31:48 +0100] "GET / HTTP/1.1" 200 0 "-" "curl/7.49.0" 0 www.example.org - -
127.0.0.1 - - [08/Dec/2020:15:31:48 +0100] "GET / HTTP/1.1" 200 0 "-" "curl/7.49.0" 0 www.example.org - -
127.0.0.1 - - [08/Dec/2020:15:31:51 +0100] "GET / HTTP/1.1" 418 0 "-" "curl/7.49.0" 0 foo.example.org - -
127.0.0.1 - - [08/Dec/2020:15:31:52 +0100] "GET / HTTP/1.1" 418 0 "-" "curl/7.49.0" 0 foo.example.org - -
^C

szuecs avatar Dec 08 '20 14:12 szuecs

for that case, when the owner doesn't know the hostnames in advance, I recommend the second example described above. Of course it still needs changes to what we allow in the route groups.

The 404 might have been just a mistake resulting in an invalid route, I cannot imagine a different case.

..., now it just works without Weight().

Careful! It's still an undefined behavior, if you restart skipper a couple of times, you will probably have different results, receiving only 418s.

aryszka avatar Dec 08 '20 14:12 aryszka

for that case, when the owner doesn't know the hostnames in advance, I recommend the second example described above. Of course it still needs changes to what we allow in the route groups.

I don't think that approach is a great one. :)

The 404 might have been just a mistake resulting in an invalid route, I cannot imagine a different case.

I consider this as, I was doing something wrong. I could have tricked myself.

..., now it just works without Weight().

Careful! It's still an undefined behavior, if you restart skipper a couple of times, you will probably have different results, receiving only 418s.

I think we should consider this as bug and we should fix it long term (v1 ?) with a routing tree change. Maybe the internal proposed "hostname before path match" should be also considered. So a basically 2 tree lookups (or one tree with both), first host, second path to reduce the predicate list to match.

szuecs avatar Dec 08 '20 15:12 szuecs

There is this option of having high priority predicates. Following through with this idea in the current code setup, one could have a HostStrict predicate that needs to be an exact match. This would need to behave similarly to how the headersExact matcher works. Basically this can add a hostExact property to the leafMatcher and in the matchLeaf function it would have to be checked before matching hostRxs.

Notes

  • names are just suggestions. the hostExact property suggestion is to match the existing headersExact property in the leafMatcher.
  • as far as the new predicate name goes I don't have a strong opinion except that it's not similar to the Path and PathRegexp predicates and it can't be without a breaking change.

NorthFury avatar Dec 11 '20 14:12 NorthFury

There is this option of having high priority predicates.

If it is about https://github.com/zalando/skipper/blob/4dcbf4bfffaf6d8f31bc7397dbd05181f4ff6857/proxy/proxy.go#L814-L823 that already seemed adhoc to me

The problem of wildcard domain match feels similar to wildcard path matching.

AlexanderYastrebov avatar Dec 11 '20 23:12 AlexanderYastrebov

The problem of wildcard domain match feels similar to wildcard path matching.

I would argue that wildcard domain matching is much more limited in scope. I would choose not to increase the complexity because of it, at least not without having good use cases for it.

And to explain my suggestion, there is already a list of "high priority" predicates, although not marked as such. In fact, the current Host predicate is part of that "high priority" predicates list. Here is where you can find them: https://github.com/zalando/skipper/blob/aa497f6330716505e98cfb1dd5175aed7c07c846/routing/matcher.go#L31-L43

And here is where those are used: https://github.com/zalando/skipper/blob/aa497f6330716505e98cfb1dd5175aed7c07c846/routing/matcher.go#L447-L473

NorthFury avatar Dec 12 '20 10:12 NorthFury

@szuecs While wildcards are fine, we could solve the problem also more efficiently with a host-suffix support instead of wildcards.

tkrop avatar Dec 22 '20 16:12 tkrop

@tkrop the normal way of suffix match won’t be possible for DNS (only one wildcard until the dot is possible) and I am not yet talking about implementation details only about what to achieve. We could perfectly do in skipper a suffix search and enforce via jsonSchema.

szuecs avatar Dec 22 '20 19:12 szuecs

I think we should do the work around mentioned in https://github.com/zalando/skipper/issues/1637#issuecomment-740649193 to add in all cases the Host(".*[.]example[.]org") predicate to all routes with Host() to ensure we have 2x Host() to have the priority for non wildcard definitions. What we need to clarify, too is:

  • [ ] how does kube-ingress-aws-controller works in this case https://github.com/zalando-incubator/kube-ingress-aws-controller/issues/393
  • [ ] Feature request external-dns https://github.com/kubernetes-sigs/external-dns/issues/1927

szuecs avatar Jan 19 '21 11:01 szuecs

@szuecs

can this be considered safe to reduce weight by adding Weight(0.1)

I think it should be other way around - use Weight (or True) to give preference to exact match. Note that Weight requires integer argument we just do not validate it strict enough and round 0.1 to 0: https://github.com/zalando/skipper/blob/1e26fcd250309d05da5a7b086c784ba100cf07da/routing/datasource.go#L341-L355 E.g. (note that correct wildcard regexp should be more complex https://stackoverflow.com/questions/106179/regular-expression-to-match-dns-hostname-or-ip-address):

./bin/skipper -inline-routes='www: Host(/www[.]example[.]org/) && Weight(1) -> inlineContent("www") -> <shunt>; wildcard: Host(/[a-z-0-9]+[.]example[.]org/) -> inlineContent("wildcard") -> <shunt>;'

AlexanderYastrebov avatar Nov 23 '21 22:11 AlexanderYastrebov

@AlexanderYastrebov I would rather use True(), but maybe we can also change the weight in the routing package. Maybe @aryszka has a suggestion.

szuecs avatar Nov 24 '21 15:11 szuecs

Related #284

AlexanderYastrebov avatar Nov 17 '22 10:11 AlexanderYastrebov

Another idea is to allow RouteGroup without hosts and let users use Host(/[a-z0-9]+[.]example[.]org/) predicate:

apiVersion: zalando.org/v1
kind: RouteGroup
metadata:
  name: my-api
spec:
  hosts: []
  backends:
    - name: my-api-http
      serviceName: my-api-service
      servicePort: 8080
      type: service
  routes:
    - pathSubtree: /api
      predicates:
        - Host(/[a-z0-9]+[.]example[.]org/)
      backends:
        - backendName: my-api-service

The DNS entries for the hosts then might be created either via external-dns CRD or via another auxiliary RouteGroup that enumerates all allowed hosts:

apiVersion: zalando.org/v1
kind: RouteGroup
metadata:
  name: my-api-hosts
spec:
  hosts:
    - foo.example.org
    - bar.example.org
    - baz.example.org
  backends:
    - name: shunt
      type: shunt
  routes:
    - path: /whatever
      predicates:
        - False()
      backends:
        - backendName: shunt

Another usecase for the host-less RouteGroup could be definition of cluster-wide catch-all routes without Host predicate.

AlexanderYastrebov avatar Nov 17 '22 11:11 AlexanderYastrebov

We could also create a spec.hostRegexp that is optional and one of spec.hosts and spec.hostRegexp should be defined

szuecs avatar Nov 21 '22 19:11 szuecs