frankenphp
frankenphp copied to clipboard
use escaped path instead of unescaped path
It appears that nginx + fpm will not unescape a path, while FrankenPHP will. Thus the following url is a 404 with nginx + fpm: https://withinboredom.info/2024/08/12%2foptimizing-cgo-handles/
However, with FrankenPHP, it gets translated into https://withinboredom.info/2024/08/1/optimizing-cgo-handles/ and is not a 404.
For applications where they are expecting the original path (//analytics/original%2fpath), this will result in an error. Instead, we should send PHP the actual requested path.
This PR sends the escaped path instead of the unscaped path which still sanitizes the url, but leaves the original escapes intact.
Is there any docs about the expected behavior? Cold we check what Apache, Caddy+FPM and the built-in PHP development server do to be sure to be consistent?
Looking at the output of php -S (at least as the canonical implementation, will compare other sapis tomorrow):
- REQUEST_URI:
/foo%2fbar - SCRIPT_NAME:
/foo/bar - PHP_SELF:
/foo/bar
So, that means this PR will get a bit more complicated (for example, splitPos for docURI won't work). but we can revert most of this commit. I'll compare to other SAPIs and try to find some documentation/specs that describe a proper behavior here.
The values are somewhat documented here: https://www.php.net/manual/en/reserved.variables.server.php
Specifically:
REQUEST_URI: The URI which was given in order to access this page; for instance, '/index.html'.
SCRIPT_NAME: Contains the current script's path. This is useful for pages which need to point to themselves. The FILE constant contains the full path and filename of the current (i.e. included) file.
PHP_SELF: The filename of the currently executing script, relative to the document root. For instance, $_SERVER['PHP_SELF'] in a script at the address http://example.com/foo/bar.php would be /foo/bar.php. The FILE constant contains the full path and filename of the current (i.e. included) file. If PHP is running as a command-line processor this variable contains the script name.
Which suggests a 'literal', non-processed URI for REQUEST_URI. Digging into the comments on this page, I was able to get a little more information from people studying these behaviors of these variables across time/SAPIs (I love comments on these pages).
REQUEST_URI:
- should include query parameters
- before any rewriting (e.g., raw URL after the domain, as the user typed it)
- includes leading slash
SCRIPT_NAME:
- filename of the entry point script (e.g.,
index.php; should be constructable by removing DOCUMENT_ROOT from PHP_SELF) - excludes PATH_INFO
- excludes query parameters
PHP_SELF:
- URL after all rewrites and processing
- includes PATH_INFO
- excludes query parameters
I also noted that we may not have the correct values for PATH_INFO:
PATH_INFO: Contains any client-provided pathname information trailing the actual script filename but preceding the query string, if available. For instance, if the current script was accessed via the URI http://www.example.com/php/path_info.php/some/stuff?foo=bar, then $_SERVER['PATH_INFO'] would contain /some/stuff.
This should be the values after the php script: /index.php/some/path should be /some/path, different from REQUEST_URI.
Apache and Nginx
Apache has AllowEncodedSlashes to process these urls. It looks like the typical usecase is something like: /example/http:%2F%2Fwww.someurl.com/ for proxies and potentially retrieving screenshots. Apache considers these invalid urls unless this setting is set to On or NoDecode (and processes them or doesn't, depending on the value of this setting).
For nginx, it depends on how you formulate the variable in your configuration. If you use $uri, then it will pass on the unescaped version, but $request_uri will be the original escaped version. Looking at the default fastcgi parameters that came with my nginx installation (ubuntu 24.04), it sets REQUEST_URI to $request_uri.
In both SAPIs, this behavior is configurable for handling url-decoding before passing to PHP.
I'm going to close this PR and move the above text into a separate issue.