Pico
Pico copied to clipboard
Nginx rewrite rule in documentation is incorrect
The currently given rule in the example config is:
location ~ ^/((config|content|vendor|composer\.(json|lock|phar))(/|$)|(.+/)?\.(?!well-known(/|$))) {
try_files /index.php$is_args$args =404;
}
As was mentioned before in #572 (which was closed while unresolved) this doesn't work. The rule should instead be something like this:
location ~ ^/(config|content|vendor|composer) {
rewrite /(.*)$ /index.php?$1;
}
That will redirect any requests to those directories or their contents to instead be handled by Pico as content paths
Yeah, a simply wrong sample config might explain why we see support requests for nginx all the time :see_no_evil: Apart from the regexes not being equivalent, can you elaborate why try_files
doesn't work, but rewrite
apparently does work? I don't use nginx myself, thus I can only read the docs, but I always thought that the only difference is that try_files
won't pass the original URL to Pico (what is intentional actually).
Sure!
rewrite
is basically like an internal redirect within nginx itself - it takes the request URL, changes it according to the rewrite rule, and passes it back to itself to actually process. Only the final rewritten version actually reaches the site.
try_files
, as you might expect from the name, checks to see if files exists according to the list of parameters given and if so, returns them - except for the last parameter provided, which I forget the term for but is basically an internal rule. So what the root try_files $uri $uri/ /index.php$is_args$args
directive does is take the uri, check if it's a path for a file, then check if it's a path for a directory, and then the last parameter is loosely equivalent to a rewrite
So what try_files /index.php$is_args$args =404
does is check if the file index.php(?stuff) exists: if so, it returns index.php as a file; if not, it uses the =404 rule and returns a server 404 page. This means anything caught by the existing rule just returns Pico's index.php page as a file (because returning it as a file parameter from the try_files
directive means it doesn't go through the .php fpm)
The other issue with the existing rule is that it only catches paths that are .json .lock or .phar files, meaning that the .yml config files or .php plugin files or etc. can be just fetched directly.
So, since try_files
is only used when you want to check for and potentially return files based on the uri, and we don't want to ever return any of the files in those folders but instead hand the uri to pico to handle, you want to use rewrite
.
(As a disclaimer, I would definitely not call myself an expert on nginx, just a long-time user, so there might be some other technical differences that aren't visible in practice, but everything I wrote here is accurate to the best of my knowledge.)
This means anything caught by the existing rule just returns Pico's index.php page as a file (because returning it as a file parameter from the
try_files
directive means it doesn't go through the .php fpm)
Ok, wow, this definitely explains why this doesn't work :see_no_evil:
Thank you very much for your explanation! Helped a lot to understand what is going on. :heart:
So try_files
can't ever work here. I've some more questions:
-
Does
rewrite
send the originalREQUEST_URI
as env variable to php-fpm, or only the rewritten URL? I'm asking because appending the requested page to the rewritten URL (i.e. passing it as query parameter) is very error prone, especially in combination with other query parameters (e.g. for requests likehttps://example.com/pico/content/my-page?var1=foo&var2=bar
)? That's why we switched fromQUERY_STRING
based URL rewriting toREQUEST_URI
with Apache. If nginx only sends the rewritten URL to php-fpm, is there a way to change that? Pico isn't any special about this, AFAIK most CMS require it this way. -
rewrite
doesn't send a redirect to the user's browser, but rather rewrites the request internally (i.e. the user doesn't see the rewritten URL in the address bar), right? -
The proposed alternative doesn't match the original regex's functionality (or at least what we tried to achieve with it, even though it apparently never worked). The new config must achieve the following:
- Always send any request to the
config/
,content/
,content-sample/
,lib/
, andvendor/
directories (content-sample/
andlib/
were missing before), as well as theCHANGELOG.md
,composer.json
,composer.lock
, andcomposer.phar
files (CHANGELOG.md
was missing before), to Pico, no matter whether the file exists or not. - Always send any request to any file or directory starting with a
.
to Pico, no matter whether the file exists or not. - As an exception to no. 2, never send any request to the
.well-known/
directory (or any.well-known/
deeper in the tree) to Pico. - If a file or directory exists on the filesystem, serve it as usual (i.e. don't send the request to Pico). The request should be treated like no other rules were specified, i.e. requesting a
.php
file (e.g. in Pico'splugins/
directory) should indeed cause nginx to pass this request to php-fpm. In the original (broken) example config this was implemented with thelocation /
section. - Send anything else to Pico. Again, originally implemented with the
location /
section.
The original regex
^/((config|content|vendor|composer\.(json|lock|phar))(/|$)|(.+/)?\.(?!well-known(/|$)))
corresponds to the first three rules; here's a visualization of the regex (using Regexper):Basically the nginx config is supposed to match Pico's default Apache config: https://github.com/picocms/Pico/blob/09aa82578710d82dd7dc482febe32991be0ea307/.htaccess#L2-L14
- Always send any request to the
I just tasked ChatGPT with this - and it failed miserably at first giving the instructions above, even after multiple correction loops :laughing: However, tasking it with adapting the Apache config for nginx at least provided something that looks promising. However, I did many tests with ChatGPT and it often creates stuff that looks promising, but is very wrong in reality. Can you take a look?
server {
listen 80;
server_name example.com;
root /var/www/example.com;
location / {
index index.php;
try_files $uri $uri/ /index.php?$args;
}
location ~ /(config|content|content-sample|lib|vendor)(/|$)|^(CHANGELOG\.md|composer\.(json|lock|phar))(/|$)|(^\.|/\.)(?!well-known(/|$)) {
rewrite ^(.*)$ /index.php last;
}
location ~ \.php$ {
include fastcgi_params;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_pass unix:/run/php/php7.2-fpm.sock; # Replace with appropriate PHP version and path
}
}
I don't know much about nginx, but from just comparing it with the original config and yours, it looks like $is_args
is missing for try_files
, and the last
in rewrite
might cause the request to never reach php-fpm?
Does rewrite send the original REQUEST_URI as env variable to php-fpm, or only the rewritten URL?
Ahhh, I missed that this is how it works! I'm a lot weaker on how Pico works than I am on how NGINX works and the docs I saw all seemed to use the path as a query parameter, so that's what I used when trying to fix it. 😅
You are right that rewrite
changes the $request_uri
when it runs. Preserving the original URI is possible but a little trickier, I'll get back to you on that.
Aside from that, here's a revision of what chat GPT gave you:
server {
listen 80;
server_name example.com;
root /var/www/example.com;
# this should be at the server level
index index.php;
# regex is not my strong point but i believe what you need is:
# (^/(config|content|content-sample|lib|vendor)/) - match locations starting with (i.e. in) these directories
# (^/(CHANGELOG\.md|composer\.(json|lock|phar))$) - match locations of these exact files
# (/\.well-known/) - match locations containing this string anywhere
# joined with | to match any of them
location ~ (^/(config|content|content-sample|lib|vendor)/)|(^/(CHANGELOG\.md|composer\.(json|lock|phar))$)|(/\.well-known/) {
# last means that it won't rewrite it again - it'll still get processed by fpm. it's necessary for situations where your rewrites might cause loops.
# so it's not necessary here but doesn't hurt
rewrite .* /index.php$is_args$args last;
}
# it doesn't matter as much here because regex locations get parsed separately
# but it's good practice to remember not to put your root location at the top so it doesn't override all your other locations
location / {
try_files $uri $uri/ /index.php$is_args$args;
}
# i copied this from my existing config
location ~ \.php$ {
include snippets/fastcgi-php.conf;
fastcgi_param SCRIPT_FILENAME $request_filename;
fastcgi_pass unix:/var/run/php/php7.4-fpm.sock; # Replace with appropriate PHP version and path
}
}
All right, I tested this solution for the URI and it seems to work!
server {
# the listen/server_name/etc directives
set $original_request_uri $request_uri;
# ...
location ~\.php$ {
include snippets/fastcgi-php.conf;
fastcgi_param SCRIPT_FILENAME $request_filename;
fastcgi_param REQUEST_URI $original_request_uri; # use original URI
fastcgi_pass unix:/var/run/php/php7.4-fpm.sock; # Replace with appropriate PHP version and path
}
For conciseness, here's all of it in one place:
server {
listen 80;
server_name example.com;
root /var/www/example.com;
index index.php;
set $original_request_uri $request_uri;
location ~ (^/(config|content|content-sample|lib|vendor)/)|(^/(CHANGELOG\.md|composer\.(json|lock|phar))$)|(/\.well-known/) {
rewrite .* /index.php$is_args$args last;
}
location / {
try_files $uri $uri/ /index.php$is_args$args;
}
location ~ \.php$ {
include snippets/fastcgi-php.conf;
fastcgi_param SCRIPT_FILENAME $request_filename;
fastcgi_param REQUEST_URI $original_request_uri;
fastcgi_pass unix:/var/run/php/php7.4-fpm.sock; # Replace with appropriate PHP version and path
}
}
I've updated the regex, rewrite, and URI directives in my own configuration to match this and tried them out, and it's all working correctly. Normal pages show up normally, normal files show up normally (including a root config.txt
I made just for this), and attempts to access any of the blocked locations are correctly returning Pico's own 404 pages.
Thanks again @InspectorCaracal! :heart:
location ~ (^/(config|content|content-sample|lib|vendor)/)|(^/(CHANGELOG\.md|composer\.(json|lock|phar))$)|(/\.well-known/) {
I remember that there were some issues with specially crafted requests, requiring (/|$)
after any path, no matter whether it's a directory or file. For files this was something about PATH_INFO
support (even though Pico doesn't use it, PHP still fiddles with it; also see below), and for directories it's necessary because one can request a directory without the trailing /
, too. I guess we better be safe than sorry. Also note that the .well-known
part matches the directory, even though it is supposed to not match it, but any other file and directory starting with a dot. That's why the original regex uses a negative lookahead ((?!)
). I believe that the following regex should be correct, can you please give it a try?
location ~ ^/((config|content|content-sample|lib|vendor|CHANGELOG\.md|composer\.(json|lock|phar))(/|$)|(.+/)?\.(?!well-known(/|$))) {
rewrite .* /index.php$is_args$args last;
I guess we can make the regex even simpler (i.e. faster) with the following. Can you please give it a try?
rewrite ^ /index.php$is_args$args last;
try_files $uri $uri/ /index.php$is_args$args;
I guess appending a =404
just in case is a good idea, what do you think?
try_files $uri $uri/ /index.php$is_args$args =404;
location ~ \.php$ {
A few more questions and notes about this section:
-
The original config includes a
try_files
directive. I remember that there was some weird issue regardingPATH_INFO
support that caused nginx and PHP to differ in which file was requested, causing possible security issues because PHP happily interpreted files it wasn't supposed to. Can you please take a look?try_files $uri =404;
-
The original config includes a
fastcgi_index
directive. Is this necessary?fastcgi_index index.php;
-
Why isn't
fastcgi_param SCRIPT_FILENAME
part of the params snippet? Is this a special value the common config doesn't include, like our originalREQUEST_URI
? -
We should let Pico know that URL rewriting is available by adding the following:
# Let Pico know about available URL rewriting fastcgi_param PICO_URL_REWRITING 1;
Last but not least: Do you want to open a PR to incorporate these changes on both our website (see _docs/config.md
and in-depth/nginx.md
) and Pico itself (see content-sample/index.md
)?
I remember that there were some issues with specially crafted requests, requiring (/|$) after any path, no matter whether it's a directory or file. For files this was something about PATH_INFO support (even though Pico doesn't use it, PHP still fiddles with it;
I'll have to check, but I think the location block patterns are only used within NGINX itself and wouldn't affect the server PATH_INFO? I've never encountered patterns ending with (/|$) outside of this example, though, so it's not common practice.
I guess appending a =404 just in case is a good idea, what do you think?
That would break everything. 😅 Everything but the very last parameter in a try_files
directive is treated as a file path to look up, so it'll read index.php$is_args$args
as a file location and return index.php
as a downloadable file every time. Pico itself already handles not finding pages and returns its own 404 page, anyway, which provides a more consistent end-user experience.
As for the rest of the php-fpm block: a lot of those directives you're suspecting are needed are needed - but they're boilerplate that are included in the fastcgi snippets that come packaged with nginx. It's why I have the one line running snippets/fastcgi.conf
instead. It's expected to just use that and only set the parameters you personally need, like the URI or which version of PHP to run.
(edit: ... now that I've said that, I'm not actually sure why I included the SCRIPT_FILENAME parameter manually, it's not necessary. I do a lot of "copy my config for something else and tweak it because I'm lazy" when setting up my own sites so I guess it's an artefact from someplace I did need it. 😓)
(edit 2: Oh whoops, nope, updating my full config, I rediscovered why it's there - it screwed up my subdir site. I'll poke at it more to verify if it's something specific to my config or if it should be included in this sample as well before doing the PR.)
Here is an example of what that file contains. As you can see, the tedious do-every-time work like checking if the file exists and handling the PATH_INFO is already covered. (Which I think resolves your concern about the other regex location block, too.) This file is shipped with nginx when you install it and has been part of its default installation for a long time - a decade or so, I think? So you can definitely count on it being there.
And I'll be happy to do a PR! I'll get to that in the next couple days for sure.
p.s. using ^ instead of .* in the rewrite works great 👍
Glad to see the Nginx config receiving some attention. I got curious about some things, so I went digging in the commit history. Apparently it's been SEVEN years since I wrote the initial draft for it. No wonder I can't remember anything about it anymore!
Since I'm so out-of-touch with Nginx, I don't really have much to contribute here. Just wanted to share some quick findings.
I looked at my own (very old and untouched) config. It just uses return 404
for the forbidden stuff instead of try_files
. That's because at the time it was written, we weren't yet redirecting to Pico's internal 404 page.
So, I was looking through the file history and I found this commit: https://github.com/picocms/picocms.github.io/commit/8ccd6610646d31c950ee42f5c2c3b60878a824bc, where @PhrozenByte replaced large portions of the documentation. This is where try_files
came into play, though =404
wasn't added until a bit later.
So... uh... it's possible that the current config has never actually worked properly, since it sounds like, @PhrozenByte, you updated it purely by reading the docs and possibly never actually tested it? 🙈🫣
Anyway, thanks for all the help with this @InspectorCaracal. 😁
Since I wrote the original version of that page (long ago, on an enthusiastic Nginx-switching binge), I've felt pretty powerless, and a little guilty, seeing people struggle with Nginx more recently.
I've lost pretty much all the knowledge I once had on the subject though. 😅
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two days if no further activity occurs. Thank you for your contributions! :+1:
Fascinating...
I had noticed myself that the current rule does not work very well.
I have tested this snippet in a configuration that includes other directives as well, and it seems to be working perfectly (including the fastcgi block). Thanks for fixing!
Out of curiosity: If I replace rewrite .* /index.php$is_args$args last;
with return 404;
it seems to have the exact same result for somebody trying to retrieve a forbidden path. Or could there be some cases where it doesn't?