heimdall icon indicating copy to clipboard operation
heimdall copied to clipboard

Enhanced host matching

Open dadrus opened this issue 10 months ago • 2 comments

Preflight checklist

  • [x] I agree to follow this project's Code of Conduct.
  • [x] I have read and am following this repository's Contribution Guidelines."
  • [x] I have discussed this feature request with the community.

Describe the background of your feature request

Heimdall’s current rule matching system relies heavily on path specificity, with host matching treated as an additional condition evaluated after the path is matched in a radix tree. This design works efficiently when rules differ by path, leveraging the radix tree’s O(log(n)) lookup performance. However, it becomes unintuitive and inefficient when rules primarily differ by host but share the same path. For example, defining rules for a.example.com (1) and b.example.com (2), both with path: /**, requires enabling backtracking to function correctly. Without backtracking, (2) will not match if added after (1) and vice versa. This is pretty counterintuitive (as also noted by the community members).

Additionally, when rules differ solely by host, the current system degrades to iterating over all matching rules to check host conditions, losing the radix tree’s performance advantage. This linear lookup negates the efficiency of the radix tree. It is however expected that host and path are treated symmetrically in terms of matching usage and efficiency, and do not require workarounds like backtracking (which has actually been designed for other purposes) or overly complex configurations.

Describe your idea

Host matching could also be done by integrating it into a radix tree with a host-first matching approach, paired with a unified match configuration that treats hosts and paths consistently.

Ideally the match block of a rule would look like:

match:
  hosts:              # Optional if routes are present
    - <host pattern>  # e.g., "a.example.com" or "*.example.com"
  routes:             # Optional if hosts are present
    - path: <path pattern>  # e.g., "/api/*"

with the ability to define host-only, path-only and combined match expressions:

  • Host-only: hosts: ["a.example.com"] (applies to all paths)
  • Path-only: routes: [{ path: "/api/*" }] (applies across hosts)
  • Combined: hosts: ["b.example.com"], routes: [{ path: "/api/*" }]

To support backwards compatibility, the implementation should however not break the current hosts configuration options and rather add a new host type, e.g., named wildcard. So the above definition would become:

match:
  hosts: 
    - type: wildcard
      value: a.example.com # or e.g. with a wildcard - "*.example.com"
  routes:
    - path: <path pattern>

All other host matching types - exact, glob, and regex - would become deprecated. Warnings should be emitted if these types are used. A version later, we could remove support for these types and simplify the hosts array to be just an array of values. A migration to the new structure could even be done automatically.

Only the new wildcard type will implement the radix tree lookup, using reversed domain notation (e.g., com.example.a for a.example.com) for efficient prefix-based matching with O(log(n)) performance. The deprecated types (exact, glob, regex) will retain their current behavior, ensuring compatibility during the transition.

I think, this approach significantly improves lookup performance when rules differ by host, aligning it with the efficiency of path matching.

Are there any workarounds or alternatives?

Current workarounds include:

  • Backtracking: Set backtracking_enabled: true to handle rules with identical paths but different hosts (e.g., a.example.com/** vs. b.example.com/**). As written above, this is unintuitive and adds overhead, as users expect host-based differentiation to work without additional configuration.
  • Unique Paths: Define distinct paths for each service, rather a host (e.g., example.com/a/** vs. example.com/b/**). This avoids the issue but is often impractical due to the existing API design.

Version

0.15.0

Additional Context

Related discussion in Discord: https://discord.com/channels/1100447190796742698/1100447191358787646/1352195852957126677

dadrus avatar Apr 11 '25 19:04 dadrus

Full disclosure, I’m not using hosts … but this approach makes sense to me!

I assume, given the radix tree approach, that “wildcard” type hosts only supports a single leading wildcard (or no wildcard), not wildcards in the middle. No idea if that affects people, but I’d suggest to make this very clear in the docs for “wildcard” type. The one time I saw something like this come up in a similar use case (auth callback allowlists) somebody had something like an ephemeral feature-branch deployment with a random component which wasn’t the leading part.

emsearcy avatar Apr 12 '25 00:04 emsearcy

Thank you for the comment, @emsearcy. Yeah, the approach would indeed allow only one leading wildcard. So, things like a.b.c.com, *.b.c.com, *.c.com would be possible, but not a.*.c.com. In principle that would replace the exact type and to some degree the glob type.

dadrus avatar Apr 14 '25 06:04 dadrus