CORS documentation and validation are not consistent
The docs say you can have *.com and *, and the example on https://gateway-api.sigs.k8s.io/geps/gep-1767/?h=cors shows the same. The validation only allows a fully qualified name.
I think the fix is to loosen the validation
Also its not 100% clear in the spec but I think you can only have 1 * and it must be the first element if so?
In #3667, we discussed that we will loosen this, but since it needs some research, it's better to launch with a tighter validation and loosen later. So we're removing this from the v1.3.0 milestone.
One thing ambiguous in the current spec is if a the following are valid:
foo*bar.com*.foo.*.comfoo.*.comI assume no and you can only have exactly 1 wildcard as the prefix but we should codify this and validate it
One thing ambiguous in the current spec is if a the following are valid:
foo*bar.com*.foo.*.comfoo.*.comI assume no and you can only have exactly 1 wildcard as the prefix but we should codify this and validate it
The first thing is that patterns in allowOrigins should normally consist of scheme, host, and port. So *.foo.com matches both http://xyz.foo.com:80 and https://abc.foo.com:443. In the discussion on the original PR for GEP-1767, there are examples like allowOrigins: ["https://*.foo.example", "http://*.foo.example"], which we probably should support and include in the documentation. CORS allows Access-Control-Allow-Origin: *, so we should also support the catch-all allowOrigins: ["*"].
As for * appearing in the middle of the host, it seems from the above-linked discussion that the intention there was to allow wildcarding subdomains rather than to support some fancy regexes, so I think we should not require supporting patterns as above (should we explicitly forbid them in validation?).
GEP-1767 says that * is greedy match to the _left_. The only search result for this phrase (without underscores) is the Gateway API reference, so I suppose it's not a fixed expression. Since regex is by default left-to-right (so it is in a sense "greedy from the left"), I guess that by to the _left_ the GEP author meant that there should be only one * and it should be the leftmost part of the pattern (or at least of the host part of the pattern). Could you please comment on that @arkodg ?
In general, that idea - that a * must be the leftmost character - is pretty much a base assumption in the API on basically all uses of wildcards. In hostname, for example, a * must be the first DNS label (an alphanumeric string separated by .), and can only replace a single label. (So, *.example.com only replaces foo.example.com, not bar.baz.example.com).
I think your reading in the last paragraph matches what I remember of these conversations. But, given that I don't do CORS much, it's probably more important that this wildcard behaves in the way that CORS users would expect, so I'd be in favor of doing whatever maximizes that.
But, given that I don't do CORS much, it's probably more important that this wildcard behaves in the way that CORS users would expect, so I'd be in favor of doing whatever maximizes that.
According to both the specification and Mozilla docs, the value of the Access-Control-Allow-Origin header is one of
- a specific origin (schema+host(+port)),
- the catch-all wildcard
*, - null (discouraged).
This means that if allowOrigins in HTTPCORSFilter does not contain "*", but something like "https://*.foo.example", it will always end up in the Access-Control-Allow-Origin header as a specific origin (namely the one from the Origin header of the HTTP request). So these patterns with wildcards longer than "*" are only relevant to implementations, not to clients.
Speaking about implementations, in Envoy the CORS configuration enables multiple origin matching options (e.g. prefix, suffix, contains, and even custom or safe_regex via Google RE2). The pattern "https://*.foo.example" has wildcard in the middle, but the matching options are mutually exclusive, so one cannot specify prefix and suffix simultaneously. However, such a pattern should be easily expressible with safe_regex.