caddy Provide all core matchers as CEL functions

CEL is super cool and powerful, makes it easier to write complex conditional logic for request matching. We can use placeholders to pull the values from the request and write conditions with it.

One thing that would make it even better though, is if all our existing request matchers were added to the CEL context as functions.

Consider this Caddyfile (example of doing all the permutations of conditional logic):

localhost

@both {
	path /abc/*
	method GET
}
respond @both "Both"

@neither {
	not path /abc/*
	not method GET
}
respond @neither "Neither"

@not_both {
	not {
		path /abc/*
		method GET
	}
}
respond @not_both "Not both"

@either {
	not {
		not path /abc/*
		not method GET
	}
}
respond @either "Either"

This is pretty wacky. Especially the "either" case.

If we had the matchers as CEL functions, we could do something like this for each of them (not all of these make sense as expressions, but this is just an example):

localhost

@both expression `path("/abc/*") && method("GET")`
respond @both "Both"

@neither expression `!path("/abc/*") || !method("GET")`
respond @neither "Neither"

@not_both expression `!path("/abc/*") && !method("GET")`
respond @not_both "Not both"

@either expression `path("/abc/*") || method("GET")`
respond @either "Either"

This reads a lot better, especially if you're comfortable with programming boolean logic.

May 25 '20 16:05 francislavoie

@TristonianJones I was hoping you could clarify something - I've been digging through the CEL docs and can't figure out whether it's possible to have variadic args for function overloads. I did see that && is noted as variadic, so I assume there must be a way.

Specifically, I want to be able to support all of the following:

path(request, "/foo/*")
path(request, "/foo/*", "/bar/*")
path(request, "/foo/*", "/bar/*", "/baz/*")
...

(Note that I'll be using a regexp to expand path(args) to path(request, args) so that users don't need to specify request, which would be unnecessarily verbose in this context)

Do I just use dyn as the last type? Do I have to have users wrap args in a [] list instead?

May 26 '20 03:05 francislavoie

@francislavoie The CEL proto supports variadic arguments at an AST level, but the type checker and interpreter don't currently support variadic functions. Using a list literal as an argument is the simplest approximation to what you want: path(request, [<arg_exprs>]). Though you could also add overloads to the type-checker and interpreter for arg counts 1 .. N where N is some reasonable number to simulate what you want:

path(request, string)
path(request, string, string)
path(request, string, string, string) // and so on until you hit some reasonable limit.

If you give the overloads distinct names then the type-checker will also make it simple for the interpreter to dispatch to the correct overload at runtime: path_request_string, path_request_string_2, ..., path_request_string_20. It's currently not quite as simple as I'd like to setup the interpreter to specify both the dynamic dispatch form of the function and the specialized overload, but it's still doable and you can see an example of what I'm talking about in cel-go/ext/strings.go where I have to setup a function to handle calls to either replace or string_replace_string_int.

There's nothing really preventing us from adding support for variadic functions, but we'd need to make sure it can be specified and understood in all runtimes. For us, this is mostly just a prioritization question of which things to work on first. Recently, there have been feature requests for aliased identifiers, expression linting, and adding the var-args functions into the mix means that the CEL user-base is really maturing pretty quickly now. Exciting stuff. I hope I can get to it all soon. :)

May 26 '20 04:05 TristonianJones

Thanks for the answer! Sounds good.

May 26 '20 04:05 francislavoie

Good idea! As long as these functions are only in the global scope for request matcher CEL expressions, then this should be fine. (As opposed to making these functions global for all CEL expressions in case CEL is used anywhere else in Caddy -- the global namespace is too valuable in that case).

May 26 '20 19:05 mholt

So I'm coming back to this to think about how it would work, I'm struggling to see how it could be done efficiently.

The problem is that matchers actually have their arguments set up on their struct, and the Match method takes only the request. That means that we would need to construct the MatchPath struct (for example) on every request and fill it with the args from the CEL function invocation, so that we can then call Match(requestFromContext) on it.

I'm don't think it's possible to pre-allocate the struct based on the compiled CEL expression to avoid creating the struct on every request...

I guess we could have all the matchers provide a struct-less Match function for the purposes of making it work for CEL, but is that worth the maintenance overhead? And if they're struct-less, how could they be loaded dynamically by their module name? 🤔

I don't know what the way forwards is here 😢

Sep 20 '20 18:09 francislavoie

What if, instead, you just had access to the same properties of the request that matchers have, and you write your own "matcher"? i.e. something like request.uri.path or whatever. Oh wait, that's like {placeholders} already? (Although they get converted to strings, so we lose some type information.) We could probably make a strongly-typed request value for CEL, right?

Sep 25 '20 22:09 mholt

Well @mholt the point was to get the full semantics of the existing matchers we have, like the file matcher, but be able to compose boolean logic out of them more easily than with a bunch of not matchers. You can already do stuff like path and method matchers easily in CEL because we already have the request and placeholders, but that's not enough for things that need to read from disk or whatever.

Sep 25 '20 23:09 francislavoie

I understand that, and I like the idea, but I dunno how feasible it is compared to what we get from it. Most of the matchers' functionality could be pretty easily exposed if we had a strongly-typed Request value, I think? And that might be easier. I'm just trying to see what is more achievable, if it's worth doing anything at all for this. But let's see if we can find a way to make existing matchers accessible, I guess.

Sep 25 '20 23:09 mholt

I haven't really given myself a refresher on this in quite a while @TristonianJones but as a followup to https://github.com/caddyserver/caddy/pull/4264#issuecomment-922375797, I guess the main question is at this point:

Is there a way to provision an interface{} from the "inputs" to a CEL function at compilation time? Caddy's matchers are types which implement a Match(r *http.Request) bool function. We need a way to initialize those types.

Essentially I'm thinking, consider a CEL expression like this (contrived example):

file({"try_files": ["{path}", "{path}/", "index.php"]}) && method(["GET"]) || path(["/foo*", "/bar*"])

We'd need a hook during compilation to take file, and unmarshal the JSON in the first param into the MatchFile type, call Provision(ctx) on it to let it do any initialization (optimizing params or something, setting defaults), then store that type in a map[int]interface{} with some int ID, then actually transform the function in the CEL to something like file_<id>(request) which would actually call Match(request) at runtime on the given instance.

I have no idea if this is possible, or how it would be implemented. How much of this parsing would need to be done ahead of time by Caddy, and how much is possible via somekind of hooks into CEL's compilation phase?

Hopefully you can follow what I'm trying to say, and that it doesn't sound like babbling 😅

Sep 18 '21 21:09 francislavoie

@francislavoie I have found a somewhat convoluted way to provide the experience you originally outlined:

method("GET") && path("/foo/*") || path("/foo/*", "/bar/*")

The trick is to use CEL parser.Macro objects to rewrite the variadic argument expressions and to inject a hidden request lookup as an argument to a hidden function declaration (lots of hidden things). The function declaration will create a matcher on the fly and evaluate against the request, e.g.

MatchMethod(methods.([]string).Match(request)

I took a stab at optimizing this path using a cel.CustomDecorator and have been able to precompile the matcher to get the same performance benefit you would expect from natively using the matcher. My only concern is that it's a lot of code to support the matchers in CEL this way. As matchers may not be added often, this may be acceptable. However, I wanted to check in to see if the maintenance burden is acceptable before I implement the remaining matchers.

Thoughts?

Apr 16 '22 07:04 TristonianJones

Wow, that sounds great! Yeah, essentially what I had in mind.

You're right, amount of code/maintainability is a concern, but ultimately the code would probably be pretty boilerplate-y I think, mostly just initializing a struct/type's data so I figure it might be okay.

We don't plan on changing/adding matchers very often; we're kinda locked on their API surface (for existing ones) so changing would be a breaking change for JSON configs; there's only so much you can match in a request (not much left to add). Can't rule it out completely, obviously, but yeah it'll probably be pretty "write-once" here to add support for this).

I think what would make sense is to have each of the matcher types (like MatchMethod etc) implement a func/interface that implements like UnmarshalExpression or something which gets called by MatchExpression itself when an unknown function (or known, idk) is encountered in parsing, to trigger the macro logic for that matcher; the module for each matcher can be loaded by module ID like http.matchers.<name>.

(Btw, you can see a list of known modules here https://caddyserver.com/docs/modules/, Ctrl+F for http.matchers; only a few third party ones, not so important that third party matchers support CEL)

How would it look in theory for matchers like file which have more complex config? Would it work to parse the args as JSON directly?

Apr 16 '22 12:04 francislavoie

How we are thinking about support for the existing matchers might be a bit different. What I'm aiming would add compile time cost, but minimal runtime cost.

Use parser.Macro objects to expose methods with signatures aligned to the existing behavior you have for matchers
Have the parser.MacroExpander rewrite the function to one that takes a request argument in addition to a list or map of arguments that the user would normally specify in the matcher:

macro: method("GET", "POST")
expanded call: method(request, ["GET", "POST"])

Use an interpreter.InterpretableDecorator to manipulate the program execution to precompile a MatchMethod instance arguments referenced in the second arg to the real method overload, and return an InterpretableCall instance which does a lookup of request and passes it through to evaluate against the precompiled MatchMethod instance.

// planned call: 
func(request, methodNames ref.Val) ref.Val {
   // error, do nothing, implement ... just a stub really.
}
// decorated call, precompiles the `methodNames` arg into a cel `interpreter.InterpretableConst`
// and then further precompiles this value to a MatchMethod instance.
func(args... ref.Val) ref.Val { 
   request := args[0].Value().(*http.Request)
   return types.Bool(precompiled_match.Match(request))
}

For matchers that take structured inputs, validation of the input content can happen both during the macro expansion as well as during the program plan step when the decorator is run. Effectively, you'd be getting nearly the exactly same speed of execution and experience with richer expression support. And you're right there's a lot of boiler plate that can be consolidated. I'm currently having each matcher implement the cel.Library interface so it's easy to take an instance of the matcher and add it into the base cel.Env used for Caddy matchers. Any matcher that wants to be supported within CEL can do so by implementing the cel.Library interface (or maybe we find a cleaner way to package this that's more Caddy-esque). I'm wide open to feedback here.

Does this sound reasonable to you?

Apr 16 '22 18:04 TristonianJones

Yeah, sounds perfect!

Thanks so much for looking into this, pretty exciting that we can make it happen ☺️

I'm away for the weekend so I'm just looking at this from my phone, I haven't looked at the CEL interfaces to internalize how it would work. Sounds like a plan to me, though!

If you're willing to make a PR to get one or two matchers working, I could probably add support for the rest!

Apr 16 '22 20:04 francislavoie

@francislavoie no worries. I should be able to get all of the standard http matchers configured, though may have to leave the file matchers as an exercise to be completed later. It at least paves the way for incremental support and a seamless transition from the existing format to one that uses more of CEL.

Apr 16 '22 21:04 TristonianJones

Thanks both for working on this! It's a way cool improvement to Caddy.

I'll close this, since most of the common matchers have CEL functions now; we can always add more if needed I guess. Thank you again!

Sep 14 '22 04:09 mholt

What's the equivalent CEL of the following caddyfile?

    @allowed {
        not {
            not remote_ip private_ranges
            not maxmind_geolocation {
                db_path "./Country.mmdb"
                allow_countries US
            }
        }
    }

Nov 22 '23 06:11 m2acgi

Plugin matchers can't be used in CEL unless they implement some functions to fulfill an interface. Ask the plugin maintainer to add the feature.

Nov 22 '23 08:11 francislavoie

Thanks.

Nov 22 '23 08:11 m2acgi

caddy caddy copied to clipboard

Provide all core matchers as CEL functions

caddy
caddy copied to clipboard