gerbil icon indicating copy to clipboard operation
gerbil copied to clipboard

Impossible to bind root "catch-all" handler in standalone `httpd`

Open nafarlee opened this issue 5 months ago • 30 comments

Description

When creating handlers for the standalone httpd server, it is not possible to bind a handler to / such that it handles all paths (or at least, all paths without a more specific handler match). I am not sure if this is intentional, but:

  • This behavior is pretty surprising when compared to how handler routing works with at least one subpath (shown in the Root Cause section)
  • I could see some good use cases for having a "catch-all" handler, either for very dynamic routing scenarios or to customize a "no route" response

Unfortunately, this does mean (if I am reading the source correctly) that any server providing a catch-all handler would be unable to serve static files and servlets due to the priority resolution order. We would probably need to fix that by changing the resolution order.

Reproduction

gerbil.pkg

(package: example)

build.ss

#!/usr/bin/env gxi
(import :std/build-script)

(defbuild-script
  '("handler"))

handler.ss

(import (only-in :std/net/httpd
                 http-response-write))

(export handle-request handler-init!)

(def (handler-init! _)
  (displayln "Now listening..."))

(def (handle-request req res)
  (http-response-write res 200 [] "Hello, World!"))

Server Terminal

❯ gerbil build && gerbil httpd -G .gerbil config --handlers '(("/" :example/handler))' && gerbil httpd -G .gerbil server
... build in current directory
Now listening...

Client Terminal

❯ curl localhost:8080
Hello, World!⏎                                                                                                                                     
❯ curl localhost:8080/
Hello, World!⏎                                                                                                                                     
❯ curl localhost:8080/a
Not Found⏎                                                                                                                                         
❯ curl localhost:8080/a/
Not Found⏎                                                                                                                                         
❯ curl localhost:8080/a/b
Not Found⏎                                                                                                                                         
❯ curl localhost:8080/a/b/
Not Found⏎                

Root Cause

It looks like the following is the culprit: https://github.com/mighty-gerbils/gerbil/blob/e55e0806a77f7364c649dbd99ada5972b6f90689/src/tools/gxhttpd/server.ss#L236-L243

The reproduction is above, but to see this behavior in a more confined example:

❯ cat main.ss 
#!/usr/bin/env gxi
(import :std/debug/DBG)

(def (find-handler tab server-path) 
  (displayln "### " server-path)
  (let loop ((path server-path)) 
    (DBG find-handler: path)
    (cond 
     ((string-empty? path) #f) 
     ((hash-get tab path)) 
     ((string-rindex path #\/) 
      => (lambda (index) (loop (substring path 0 index)))) 
     (else #f))))

(def (main . args)
  (define ht (plist->hash-table (list "/" 1 "/sub" 2)))
  (displayln (find-handler ht "/"))
  (displayln (find-handler ht "/a"))
  (displayln (find-handler ht "/sub"))
  (displayln (find-handler ht "/sub/a")))

❯ ./main.ss 
### /
find-handler
  'path => "/"
1
### /a
find-handler
  'path => "/a"
find-handler
  'path => ""
#f
### /sub
find-handler
  'path => "/sub"
2
### /sub/a
find-handler
  'path => "/sub/a"
find-handler
  'path => "/sub"
2

nafarlee avatar Jul 07 '25 22:07 nafarlee

Congrats for having a 1337 bug!

I might not be understanding properly, but if your problem is with find-handler being unable to find / as its root, I believe the issue is that (string-rindex "/foo" #\/) returns 0 then you do a (substring "/foo" 0 0) which returns "" that you match into #f. So either you want to special case 0 and return "/" instead of the substring, or you want to match "" into the same as "/" and not #f.

fare avatar Jul 08 '25 02:07 fare

Thanks for calling that out, I didn't notice 😄

To be clear, I am in agreement that find-handlers algorithm is the cause. I opened the issue to confirm that this behavior isn't intentional, and assuming it isn't, discuss what the preferred fix is.

The simplest thing that occurs to me is to replace...

((string-empty? path) #f)

...with...

((string-empty? path) (hash-get tab "/"))

Happy to open a PR and add some tests if this looks good. Though there is one more big wrinkle...

As I mentioned, this change would mean a root handler (should someone choose to include one) makes it impossible for servlets and static files to resolve. If we want to allow both to coexist, I think we would need to change the handler/file/servlet resolution process.

Whether we do or don't, this is a breaking change if we consider some users are currently providing a handler on / and:

  • are expecting it to trigger on an exact match only
  • are expecting handlers to match before files/servlets if there are path clashes

Given this, are we still okay with only PRing the small diff shown above?

nafarlee avatar Jul 08 '25 04:07 nafarlee

yeah, with this a root handler will handle everything and mask files and servlets, not ideal... we should avoid this.

vyzo avatar Jul 08 '25 05:07 vyzo

@drewc any opnions on the desired behavior?

vyzo avatar Jul 08 '25 05:07 vyzo

I think the way HAProxy does ACL's is a more "global" solution. Matching with paths alone and dispatching handlers that way is a can of worms.

In other words, we need a way for a request to somehow say "use this handler" beyond exact path matches. Here's a snapshot of a haproxy.cfg that uses ACL's. Essentially, an ACL is just a predicate that may match a request.

First, a couple ACL's that overlap. A lisp_url is a path that begins with "/ecm/". The new_reports is a lisp path, but extends.

acl lisp_url path -i -m beg /ecm/

acl new_reports path -i -m beg /ecm/new/reports

Now, we don't have backends per se, so imagine this is a handler module and/or function, but:

use_backend old_reports if new_reports use_backend lisp_servers if lisp_url

They operate in order so the dispatch to new_reports happens first. I could reverse them:

use_backend lisp_servers if lisp_url !new_reports use_backend old_reports if new_reports

But that extra test is not needed.

The idea of a default root handler is not really a great idea, but we do need a better way to dispatch. Like, what if I wanted a specific handler for spreadsheets?

acl xlsx path -m end .xlsx

Also ACL's can be used for a lot more than handlers and I want Gerbil's httpd to replace my HAProxy use, so imagine:

 http-request redirect location /ecm/login/?%[query] code 302 if

lisp_login_re !ACL_backdoor

Getting away from string matching paths opens up a whole bunch of other wonderful features and almost gets rid of this current "problem" as the ACL !ACL syntax means that I can choose what part of the request to match, modify the request in various ways, and then pass it off to an actor. If I want to override servlets and files I'm free to do so. Or servlets could be hidden in an ACL.

Using ACL's is a great way to gain a tonne of angles on how an http request should respond.

Especially because paths may not be anywhere near what we think they are :)

http-request replace-path ^/[^/]+/(.*) /\1 if !rpc !lisp_url !spa !entity !old_spa !report !old_assets !empty_url !login !ACL_backdoor !firebase_auth

Thoughts?

--drewc

On Mon, Jul 7, 2025 at 10:19 PM vyzo @.***> wrote:

vyzo left a comment (mighty-gerbils/gerbil#1337) https://github.com/mighty-gerbils/gerbil/issues/1337#issuecomment-3047393162

@drewc https://github.com/drewc any opnions on the desired behavior?

— Reply to this email directly, view it on GitHub https://github.com/mighty-gerbils/gerbil/issues/1337#issuecomment-3047393162, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADVTXOX6NOCFORDOEDNGYL3HNIF7AVCNFSM6AAAAACA7QNKVKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTANBXGM4TGMJWGI . You are receiving this because you were mentioned.Message ID: @.***>

drewc avatar Jul 08 '25 07:07 drewc

agreed, lets rethink the handlers match.

You guys want to pave the way?

vyzo avatar Jul 08 '25 07:07 vyzo

I don't have enough experience or opinion in this particular topic to lead on this one. Certainly, matching on prefix rather than always exact path would be a first step that would avoid looking at hundreds of rules by focusing on the path first. Then maybe matching per predicate inside a prefix would offer a second level of dispatch (and exact match would be just one such predicate, with a shortcut function to define it).

fare avatar Jul 08 '25 17:07 fare

Here is a workable proposal:

  • handlers are pattern-module pairs
  • in order

So incoming request path tries to match the handler patterns in order. If none matches, it proceeds to resolve from the file system.

So / just matches root, /* matches everything. This resolves the issue quite nicely and is pretty powerful for our needs in the foreseeable future.

What do you guys think?

vyzo avatar Jul 08 '25 18:07 vyzo

We should probably add an (optional) directory listing handler as well, maybe even .htaccess support.

vyzo avatar Jul 08 '25 18:07 vyzo

How about this extension to make it meta:

  • Handler specs are a list of (pattern-or-predicate? . module-or-handle-request) * - Both predicate?* and handle-request are procedures.
  • Executed in order

I like the wildcard idea.

In my own work I often use a regexp with groups passed as args. So */foo/(.*)/baz/(.*)* is passed to *(handler req res . regexp-matches)*. That could possibly be meta-handled by my *(proc? . proc) *handler so just bringing it up.

Love the idea of a dir handler and even WebDAV is doable there. And .*htaccess *or a sexp equiv is a great idea!

On Tue, Jul 8, 2025 at 11:49 AM vyzo @.***> wrote:

vyzo left a comment (mighty-gerbils/gerbil#1337) https://github.com/mighty-gerbils/gerbil/issues/1337#issuecomment-3049969479

Here is a workable proposal:

  • handlers are pattern-module pairs
  • in order

So incoming request path tries to match the handler patterns in order. If nonr matches, it proceeds to resolve from the file system.

So / just matches root, /* matches everything. This resolves the issue quite nicely and is pretty powerful for our needs in the foreseeable future.

What do you guys think?

— Reply to this email directly, view it on GitHub https://github.com/mighty-gerbils/gerbil/issues/1337#issuecomment-3049969479, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADVTXOMTNOVM735OCDKIHD3HQHCXAVCNFSM6AAAAACA7QNKVKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTANBZHE3DSNBXHE . You are receiving this because you were mentioned.Message ID: @.***>

drewc avatar Jul 08 '25 19:07 drewc

I am a little reluctant to put a predicate or procedure as it doesn't easily lend to dumb configuration.

We could however use a module reference that could provide a match predicate implementation.

vyzo avatar Jul 08 '25 19:07 vyzo

In that case I'll bow out of this. I can do some different patches so I can start using it without a reverse proxy mapping at some point. High Availability and dynamic updates without restart don't lend themselves to static path dispatch very well.

On the other hand .htaccess and WebDAV or even a directory listing do lend themselves quite well! Especially if .htaccess is in fact a module (with a TTL) that possibly oversees anything underneath it then my use of handlers: (or lack thereof) can continue unabated.

Here's my config in dev. Prod just has a different TTL :)

config: httpd-ensemble-v0

ensemble-servers: (httpd1 httpd2) ensemble-request-log: #t server-configuration: (root: "/srv/ecm/app/www/" handlers: () max-token-length: 10240 enable-servlets: #t cache-ttl: 1. listen: ("0.0.0.0:4200"))

On Tue, Jul 8, 2025 at 12:24 PM vyzo @.***> wrote:

vyzo left a comment (mighty-gerbils/gerbil#1337) https://github.com/mighty-gerbils/gerbil/issues/1337#issuecomment-3050064089

I am a little reluctant to put a predicate as it doesn't easily lend to dumb configuration.

We could however use a module reference that could provide a match predicate implementation.

— Reply to this email directly, view it on GitHub https://github.com/mighty-gerbils/gerbil/issues/1337#issuecomment-3050064089, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADVTXJ75CIJUOFV2KGQAZ33HQLIVAVCNFSM6AAAAACA7QNKVKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTANJQGA3DIMBYHE . You are receiving this because you were mentioned.Message ID: @.***>

drewc avatar Jul 08 '25 19:07 drewc

We can always dynamically modify handlers, we have the mux interface for this (and we can extend it etc), they are not mutually incompatible.

The difficulty lies in how we specify the config, that cant do inline procedures nicely.

vyzo avatar Jul 08 '25 19:07 vyzo

also, by specifying a module that offers a match procedure we get the most general interface... you can do whatever you like in there!

vyzo avatar Jul 08 '25 19:07 vyzo

As long as that module can be "./dynamic" with a ttl and not a :static/compiled one that can't be reloaded, that totally works. And exactly with the mux, my idea was to have it more extensible in the conf as well as overridable.

And yeah, I see how the conf file probably shouldn't import or namespace# functions and where eval takes place is another can of worms there. Ensemble commands should take care of the niggly bits and having a more extensible default mux solves my non-use of handlers: as well.

I think you're on the right track. My needs are a bit far from default so most of my wants are easy extensibility.

On Tue, Jul 8, 2025 at 12:57 PM vyzo @.***> wrote:

vyzo left a comment (mighty-gerbils/gerbil#1337) https://github.com/mighty-gerbils/gerbil/issues/1337#issuecomment-3050141462

also, by specifyibf a module that offers a match procedure we get the most general interface... you can do whatever you like in there!

— Reply to this email directly, view it on GitHub https://github.com/mighty-gerbils/gerbil/issues/1337#issuecomment-3050141462, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADVTXLBSXD6DBVACF2SM5L3HQPBJAVCNFSM6AAAAACA7QNKVKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTANJQGE2DCNBWGI . You are receiving this because you were mentioned.Message ID: @.***>

drewc avatar Jul 08 '25 20:07 drewc

I dont think we need the ttl thing for the module at all; you can have a matcher/handler that evals/loads dynamically whatever you need... so why complicate the configuration?

Current thinking is that the handler list elements are either a module (which provides both the match with a handle-request? export and the handler with handle-request), or a pair of path regexp and a module.

How does this sound?

vyzo avatar Jul 08 '25 21:07 vyzo

How about a composable setup, where you can have a handler-for-everything, a prefix-dispatching handler, a list-of-predicate-handler-pairs (where a regex or simpler string pattern is seen as a predicate), or maybe something else? Then the server hands it all to the main handler, that in turn may either be some kind of dispatcher or something else.

fare avatar Jul 09 '25 00:07 fare

Ah yes, handle-request? makes perfect sense and having the list be either modules or (path? . module) can do the composing that Fare is referring to so yup, you've got it. As long as we compile the regexps at init time then there's no performance with vs string=? so I think you've worked it out :)

On Tue, Jul 8, 2025 at 5:14 PM François-René Rideau < @.***> wrote:

fare left a comment (mighty-gerbils/gerbil#1337) https://github.com/mighty-gerbils/gerbil/issues/1337#issuecomment-3050641011

How about a composable setup, where you can have a handler-for-everything, a prefix-dispatching handler, a list-of-predicate-handler-pairs (where a regex or simpler string pattern is seen as a predicate), or maybe something else? Then the server hands it all to the main handler, that in turn may either be some kind of dispatcher or something else.

— Reply to this email directly, view it on GitHub https://github.com/mighty-gerbils/gerbil/issues/1337#issuecomment-3050641011, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADVTXPKYTXUTT6WD42GTFD3HRNFJAVCNFSM6AAAAACA7QNKVKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTANJQGY2DCMBRGE . You are receiving this because you were mentioned.Message ID: @.***>

drewc avatar Jul 09 '25 00:07 drewc

... and if we made a (current-httpd-match-result) bound with the truthy value of the predicate, and usually that predicate is a regexp match, then I can call that parameter in the handler to get my regexp groups, and ("^/foo/(.*)" . :bar/bar) as a pair gives me the "arguments" I like without an extra performance hit. Ya?

On Tue, Jul 8, 2025 at 5:25 PM Drew C @.***> wrote:

Ah yes, handle-request? makes perfect sense and having the list be either modules or (path? . module) can do the composing that Fare is referring to so yup, you've got it. As long as we compile the regexps at init time then there's no performance with vs string=? so I think you've worked it out :)

On Tue, Jul 8, 2025 at 5:14 PM François-René Rideau < @.***> wrote:

fare left a comment (mighty-gerbils/gerbil#1337) https://github.com/mighty-gerbils/gerbil/issues/1337#issuecomment-3050641011

How about a composable setup, where you can have a handler-for-everything, a prefix-dispatching handler, a list-of-predicate-handler-pairs (where a regex or simpler string pattern is seen as a predicate), or maybe something else? Then the server hands it all to the main handler, that in turn may either be some kind of dispatcher or something else.

— Reply to this email directly, view it on GitHub https://github.com/mighty-gerbils/gerbil/issues/1337#issuecomment-3050641011, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADVTXPKYTXUTT6WD42GTFD3HRNFJAVCNFSM6AAAAACA7QNKVKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTANJQGY2DCMBRGE . You are receiving this because you were mentioned.Message ID: @.***>

drewc avatar Jul 09 '25 00:07 drewc

For recursive handler definition, maybe we can have each handler take both the entire request and a "focus" as arguments, and the focus allows path-based handlers to deconstruct what happens based on relative path, extension-handlers to handle things based on extensions, etc.

fare avatar Jul 09 '25 00:07 fare

parameterization itself is not free, it might end up being slower than rematching.

vyzo avatar Jul 09 '25 05:07 vyzo

@fare you are overthinking here 🐈‍⬛

vyzo avatar Jul 09 '25 05:07 vyzo

If parameterization is that expensive then ok, you win... in fact not sure what I was thinking as chances are the request already went through a few regexp predicates so my performance worries are for naught. as adding another is negligible.

But, about a set!-able state slot in the request? That means the predicate itself can be very complex and time consuming but output some things for the handler to look at. Or even modify on fail to mark the request in some way for future predicates? I'm thinking more for a proxy but it does allow that overthinking recursion as well and it is as simple as typing the symbol state in the struct definition.

Module or Regexp.Module sounds perfect as is. Giv'r eh!

On Tue, Jul 8, 2025 at 10:43 PM vyzo @.***> wrote:

vyzo left a comment (mighty-gerbils/gerbil#1337) https://github.com/mighty-gerbils/gerbil/issues/1337#issuecomment-3051201224

parameterization itself is not free, it might end up bring slower than rematching.

— Reply to this email directly, view it on GitHub https://github.com/mighty-gerbils/gerbil/issues/1337#issuecomment-3051201224, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADVTXP3YL7GBJ45GDYDTCL3HSTZTAVCNFSM6AAAAACA7QNKVKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTANJRGIYDCMRSGQ . You are receiving this because you were mentioned.Message ID: @.***>

drewc avatar Jul 10 '25 00:07 drewc

We can add a specific slot in the request, owned by the user; that should do.

vyzo avatar Jul 10 '25 04:07 vyzo

Yup. :thumbsup:. That seems to do the trick. Simple configuration, extensible, performant, and request-specific is actually less specific than request-state so the stoner in me loves the juxtaposition.

On Wed, Jul 9, 2025 at 9:38 PM vyzo @.***> wrote:

vyzo left a comment (mighty-gerbils/gerbil#1337) https://github.com/mighty-gerbils/gerbil/issues/1337#issuecomment-3055413688

We can add a specific slot in the request, owned by the user; that should do.

— Reply to this email directly, view it on GitHub https://github.com/mighty-gerbils/gerbil/issues/1337#issuecomment-3055413688, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADVTXP2I6LB2S5PPSOZU4D3HXU4LAVCNFSM6AAAAACA7QNKVKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTANJVGQYTGNRYHA . You are receiving this because you were mentioned.Message ID: @.***>

drewc avatar Jul 10 '25 04:07 drewc

It seems we will have to change Mux interface to take an http-request object in order to resolve.

We should probably keep a simplified Mux interface with the old signature to ease the transition and not break existing code.

vyzo avatar Jul 10 '25 14:07 vyzo

Yeah makes sense. Perhaps the gxhttp-mux is a subclass and we don't need to change existing?

On Thu, Jul 10, 2025 at 7:10 AM vyzo @.***> wrote:

vyzo left a comment (mighty-gerbils/gerbil#1337) https://github.com/mighty-gerbils/gerbil/issues/1337#issuecomment-3057627031

It seems we will have to change Mux interface to take an http-request object in order to resolve.

We should probably keep a simplified Mux interface with the old signature to ease the transition and not break existing code.

— Reply to this email directly, view it on GitHub https://github.com/mighty-gerbils/gerbil/issues/1337#issuecomment-3057627031, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADVTXLKWDYIYWFN6WDWWGD3HZX5JAVCNFSM6AAAAACA7QNKVKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTANJXGYZDOMBTGE . You are receiving this because you were mentioned.Message ID: @.***>

drewc avatar Jul 10 '25 22:07 drewc

we do, the server uses the interface.

vyzo avatar Jul 11 '25 07:07 vyzo

I have been trying to follow along with the discussion thus far, but as a Gerbil newcomer, I can't say I understand the immediate next steps.

To the extent you are interested in "slicing" all of this into help-wanted issues, I would be happy to take on one or more.

nafarlee avatar Jul 15 '25 04:07 nafarlee

no worries, this is not beginner level; we will handle it.

vyzo avatar Jul 15 '25 07:07 vyzo