unit icon indicating copy to clipboard operation
unit copied to clipboard

Optionally enable REQUEST_URI rewrites

Open scheibling opened this issue 7 months ago • 7 comments

Feature Overview

Just to start off, a massive thanks to you all for your work on this. We use it enthusiastically for a number of things, and we appreciate you all!

With a recent update of one of our custom production builds of this (alpine + custom PHP image), we noticed that the rewrites for the REQUEST_URI (https://github.com/nginx/unit/commit/87077ec4ba9a59f3332c3758a3c39a5c149025e5) were disabled.

Would it be an option to make it configurable either on compile or settings-wise?

Alternatives Considered

The adding of custom (dynamic) request headers to the backend would be an alternative, to be able to send something like X_REWRITTEN_REQUEST_URI would maybe align better with the roadmap and be more generally useful

Adding a second url_rewrite/uri_rewrite field with the old functionality maybe is probably the easiest alternative

Additional Context

Even though this is not strictly necessary, and probably not useful for 90% of users we have some edge cases where we'd like it to work.

With that said - at this point this is mostly a question of if any of these alternatives would be acceptable in terms of direction. I would be happy to find the time to work/help work on a PR for an implementation of one of them.

scheibling avatar May 14 '25 15:05 scheibling

Hi, glad you're enjoying Unit!

Right, OK. This was always going to be a possible eventuality.

I guess this counts as a regression then...

Here;s a PoC patch that stuffs the rewritten URL in a new target_mod struct member, available in PHP via a UNIT_REQUEST_URI_MOD server variable.

E.g.

$ curl localhost:8080/index.php?q=a
Array
(
    [SERVER_SOFTWARE] => Unit/1.35.0
    [SERVER_PROTOCOL] => HTTP/1.1
    [PHP_SELF] => /v1/index.php
    [SCRIPT_NAME] => /v1/index.php
    [SCRIPT_FILENAME] => /home/andrew/src/php/v1/index.php
    [DOCUMENT_ROOT] => /home/andrew/src/php
    [REQUEST_METHOD] => GET
    [REQUEST_URI] => /index.php?q=a
    [UNIT_REQUEST_URI_MOD] => /v1/index.php
    [QUERY_STRING] => q=a
    [REMOTE_ADDR] => ::1
    [SERVER_ADDR] => ::1
    [SERVER_NAME] => localhost
    [SERVER_PORT] => 80
    [HTTP_HOST] => localhost:8080
    [HTTP_USER_AGENT] => curl/8.11.1
    [HTTP_ACCEPT] => */*
    [REQUEST_TIME_FLOAT] => 1747330651.5636
    [REQUEST_TIME] => 1747330651
}

/index.php?q=a was the original requested URL and /v1/index.php the rewritten URL (minus duplicating the query string which doesn't change).

If there is no rewrite then it's just set as an empty string, i.e. [UNIT_REQUEST_URI_MOD] =>

diff --git ./src/nxt_http.h ./src/nxt_http.h
index 79a0ad96..c052b3e1 100644
--- ./src/nxt_http.h
+++ ./src/nxt_http.h
@@ -149,6 +149,7 @@ struct nxt_http_request_s {
     nxt_str_t                       server_name;
     nxt_str_t                       request_line;
     nxt_str_t                       target;
+    nxt_str_t                       target_mod;
     nxt_str_t                       version;
     nxt_str_t                       *method;
     nxt_str_t                       *path;
diff --git ./src/nxt_http_rewrite.c ./src/nxt_http_rewrite.c
index 5de15ed7..1dbb0ed1 100644
--- ./src/nxt_http_rewrite.c
+++ ./src/nxt_http_rewrite.c
@@ -70,6 +70,9 @@ nxt_http_rewrite(nxt_task_t *task, nxt_http_request_t *r)
         return NXT_ERROR;
     }
 
+    r->target_mod.start = rp.target_start;
+    r->target_mod.length = rp.target_end - rp.target_start;
+
     r->path = nxt_mp_alloc(r->mem_pool, sizeof(nxt_str_t));
     if (nxt_slow_path(r->path == NULL)) {
         return NXT_ERROR;
diff --git ./src/nxt_php_sapi.c ./src/nxt_php_sapi.c
index da667b66..b9d9754a 100644
--- ./src/nxt_php_sapi.c
+++ ./src/nxt_php_sapi.c
@@ -1490,6 +1490,8 @@ nxt_php_register_variables(zval *track_vars_array TSRMLS_DC)
                      track_vars_array TSRMLS_CC);
     nxt_php_set_sptr(req, "REQUEST_URI", &r->target, r->target_length,
                      track_vars_array TSRMLS_CC);
+    nxt_php_set_sptr(req, "UNIT_REQUEST_URI_MOD", &r->target_mod,
+                     r->target_mod_length, track_vars_array TSRMLS_CC);
     nxt_php_set_sptr(req, "QUERY_STRING", &r->query, r->query_length,
                      track_vars_array TSRMLS_CC);
 
diff --git ./src/nxt_router.c ./src/nxt_router.c
index af9dad29..404ded57 100644
--- ./src/nxt_router.c
+++ ./src/nxt_router.c
@@ -5501,6 +5501,7 @@ nxt_router_prepare_msg(nxt_task_t *task, nxt_http_request_t *r,
                + nxt_sockaddr_port_length(r->local) + 1
                + r->server_name.length + 1
                + r->target.length + 1
+               + r->target_mod.length + 1
                + (r->path->start != r->target.start ? r->path->length + 1 : 0);
 
     content_length = r->content_length_n < 0 ? 0 : r->content_length_n;
@@ -5580,6 +5581,11 @@ nxt_router_prepare_msg(nxt_task_t *task, nxt_http_request_t *r,
     p = nxt_cpymem(p, r->target.start, r->target.length);
     *p++ = '\0';
 
+    req->target_mod_length = (uint32_t) r->target_mod.length;
+    nxt_unit_sptr_set(&req->target_mod, p);
+    p = nxt_cpymem(p, r->target_mod.start, r->target_mod.length);
+    *p++ = '\0';
+
     req->path_length = (uint32_t) r->path->length;
     if (r->path->start == r->target.start) {
         nxt_unit_sptr_set(&req->path, target_pos);
diff --git ./src/nxt_unit_request.h ./src/nxt_unit_request.h
index a6ebf0b6..1584359b 100644
--- ./src/nxt_unit_request.h
+++ ./src/nxt_unit_request.h
@@ -25,6 +25,7 @@ struct nxt_unit_request_s {
     uint8_t               app_target;
     uint32_t              server_name_length;
     uint32_t              target_length;
+    uint32_t              target_mod_length;
     uint32_t              path_length;
     uint32_t              query_length;
     uint32_t              fields_count;
@@ -43,6 +44,7 @@ struct nxt_unit_request_s {
     nxt_unit_sptr_t       local_port;
     nxt_unit_sptr_t       server_name;
     nxt_unit_sptr_t       target;
+    nxt_unit_sptr_t       target_mod;
     nxt_unit_sptr_t       path;
     nxt_unit_sptr_t       query;
     nxt_unit_sptr_t       preread_content;

I suppose the main problem is there is no standard name for this... though if we use our own UNIT_ namespace... on the plus side, it's a fairly minimal patch...

I also guess this sentence in the docs is wrong since 87077ec

It does not affect the query but changes the uri and $request_uri variables.

ac000 avatar May 15 '25 18:05 ac000

Magical! And here I was getting ready to sacrifice my weekend for this... 😆 Thank you!

From my experience, the X-Rewrite-Url/X-Rewrite-Uri was more common a lot of years ago, which also means there might still be software with residual handling of those headers. So my vote would be for UNIT_REQUEST_URI_MOD like you propose, that makes it pretty clear what's going on too.

I'll pull this into one of our test containers and set up a few of our staging sites with it to see if it works/is stable when I get the chance. Had to go back to 1.30 for the moment since we're seeing some variation of #1606 since the 1.34 update (working on a stack trace, I'll post that over there).

scheibling avatar May 15 '25 20:05 scheibling

Magical! And here I was getting ready to sacrifice my weekend for this... 😆 Thank you!

Heh...

From my experience, the X-Rewrite-Url/X-Rewrite-Uri was more common a lot of years ago, which also means there might still be software with residual handling of those headers. So my vote would be for UNIT_REQUEST_URI_MOD like you propose, that makes it pretty clear what's going on too.

Looks like there are security concerns with it being a header, so a server variable does look like the way to go...

I'll pull this into one of our test containers and set up a few of our staging sites with it to see if it works/is stable when I get the chance. Had to go back to 1.30 for the moment since we're seeing some variation of https://github.com/nginx/unit/issues/1606 since the 1.34 update (working on a stack trace, I'll post that over there).

Hmm, interesting... I'll see what you dig up and I my have an ask of you...

ac000 avatar May 15 '25 23:05 ac000

Here's a much simpler patch!

diff --git ./src/nxt_php_sapi.c ./src/nxt_php_sapi.c
index b9d9754a..64a88786 100644
--- ./src/nxt_php_sapi.c
+++ ./src/nxt_php_sapi.c
@@ -1459,6 +1459,9 @@ nxt_php_register_variables(zval *track_vars_array TSRMLS_DC)
                         track_vars_array TSRMLS_CC);
     }
 
+    nxt_php_set_sptr(req, "UNIT_REWRITTEN_PATH", &r->path, r->path_length,
+                     track_vars_array TSRMLS_CC);
+
     /*
      * 'SCRIPT_NAME'
      * Contains the current script's path.  This is useful for pages which need

Also renamed the PHP variable to UNIT_REWRITTEN_PATH to be more accurate.

The main difference with the other patch is this variable will always be set whether the URL was rewritten or not...

$ curl localhost:8080/index.php?q=a
Array
(
    [SERVER_SOFTWARE] => Unit/1.35.0
    [SERVER_PROTOCOL] => HTTP/1.1
    [PHP_SELF] => /v1/index.php
    [UNIT_REWRITTEN_PATH] => /v1/index.php
    [SCRIPT_NAME] => /v1/index.php
    [SCRIPT_FILENAME] => /home/andrew/src/php/v1/index.php
    [DOCUMENT_ROOT] => /home/andrew/src/php
    [REQUEST_METHOD] => GET
    [REQUEST_URI] => /index.php?q=a
    [QUERY_STRING] => q=a
    [REMOTE_ADDR] => ::1
    [SERVER_ADDR] => ::1
    [SERVER_NAME] => localhost
    [SERVER_PORT] => 80
    [HTTP_HOST] => localhost:8080
    [HTTP_USER_AGENT] => curl/8.11.1
    [HTTP_ACCEPT] => */*
    [REQUEST_TIME_FLOAT] => 1747678336.4721
    [REQUEST_TIME] => 1747678336
)

Of course as in the above, if you're not using the script option then PHP_SELF & SCRIPT_NAME has what you want, whereas when using script

$ curl localhost:8080/index.php?q=a
Array
(
    [SERVER_SOFTWARE] => Unit/1.35.0
    [SERVER_PROTOCOL] => HTTP/1.1
    [PHP_SELF] => /index.php
    [UNIT_REWRITTEN_PATH] => /v1/index.php
    [SCRIPT_NAME] => /index.php
    [SCRIPT_FILENAME] => /home/andrew/src/php/index.php
    [DOCUMENT_ROOT] => /home/andrew/src/php
    [REQUEST_METHOD] => GET
    [REQUEST_URI] => /index.php?q=a
    [QUERY_STRING] => q=a
    [REMOTE_ADDR] => ::1
    [SERVER_ADDR] => ::1
    [SERVER_NAME] => localhost
    [SERVER_PORT] => 80
    [HTTP_HOST] => localhost:8080
    [HTTP_USER_AGENT] => curl/8.11.1
    [HTTP_ACCEPT] => */*
    [REQUEST_TIME_FLOAT] => 1747678668.2806
    [REQUEST_TIME] => 1747678668
)

But I guess you do use the script option...

ac000 avatar May 19 '25 18:05 ac000

@victorserbu2709

Would the above work for you?

ac000 avatar May 19 '25 18:05 ac000

Hello @ac000 Not without modifying the application to use that new variable (but I can modify it).

Context: I have an application deployed using nginx + phpfpm, with locations like this

location ^~ /lapi/ {
...
 rewrite "^/lapi/jsonrpc" "/jsonrpc.php" break;
 rewrite "^/lapi/rest/latest/" "/rest.php" break;
 rewrite "^/lapi/test$" "/test.php" break;
...
  fastcgi_pass fastcgi_web_backend;
  fastcgi_param REQUEST_URI $request_uri;
...
}

the php app uses REQUEST_URI. I tried to leave rewrites in nginx and replace php-fpm with nginx-unit and the simpler approach was to pass from nginx a header containing request_uri

proxy_pass http://127.0.0.1:9095; #nginx-unit
proxy_set_header X-Request-Uri $request_uri;

But I have also nginx location with rewrites like:

    rewrite "/fax/([0-9]+)/([a-zA-Z0-9_]+)$" /fax/index.php?id=$1&uniqueid=$2 break;

Meanwhile I found how can this be done in nginx unit: https://github.com/nginx/unit/issues/992#issuecomment-1794779569

I could do the rewrite entirely in nginx-unit, but I think nginx will be however needed for modsecurity, this is why I consider it would be useful to be able to overwrite REQUEST_URI in nginx unit.

victorserbu2709 avatar May 20 '25 10:05 victorserbu2709

would be useful to be able to overwrite REQUEST_URI in nginx unit.

I don't think that's an option.

I think the only sane way to do this is with a new server variable. Just trying to decide the impact of not providing it and whether it's worth it or not...

ac000 avatar May 27 '25 02:05 ac000