mod_wsgi icon indicating copy to clipboard operation
mod_wsgi copied to clipboard

Apache AH01630, every request to wsgi script is preceded by an access to DocumentRoot

Open theultramage opened this issue 2 years ago • 22 comments

I use a wsgi script via WSGIScriptAlias. Everytime /script/path is requested, an internal non-logged access to /documentroot/path is attempted, for unknown reasons. My DocumentRoot restricts access to only specific files, so these strange access attempts get denied, and 'client denied by server configuration' is logged to apache error log. The script itself runs fine.

I reproduced this by installing ubuntu 22 and freebsd 13.2, installing apache24 and mod_wsgi (4.9.0 and 4.9.2, respectively). I then commented out the 'Require all granted' line from the default DocumentRoot directory, copy-pasted the WSGIScriptAlias and Directory examples verbatim from the Configuration Guidelines to the end of httpd.conf, and placed a copy of the 'hello world' script at the appropriate spot.

For some reason, mod_wsgi is looking for the requested path under DocumentRoot:

$ wget http://localhost/myapp/test
[authz_core:error] [pid X] [client XXX] AH01630: client denied by server configuration: /usr/local/www/apache24/data/test

This happens after the wsgi script is checked for correctness, but before it is executed:

[core:trace5] [pid 1861] protocol.c(713): [client 127.0.0.1:20266] Request received from client: GET /myapp/test HTTP/1.1
[authz_core:debug] [pid 1861] mod_authz_core.c(815): [client 127.0.0.1:20266] AH01626: authorization result of Require all granted: granted
[authz_core:debug] [pid 1861] mod_authz_core.c(815): [client 127.0.0.1:20266] AH01626: authorization result of <RequireAny>: granted
[core:trace3] [pid 1861] request.c(360): [client 127.0.0.1:20266] request authorized without authentication by access_checker_ex hook: /myapp/test
[authz_core:debug] [pid 1861] mod_authz_core.c(815): [client 127.0.0.1:20266] AH01626: authorization result of Require all denied: denied
[authz_core:debug] [pid 1861] mod_authz_core.c(815): [client 127.0.0.1:20266] AH01626: authorization result of <RequireAny>: denied
[authz_core:error] [pid 1861] [client 127.0.0.1:20266] AH01630: client denied by server configuration: /usr/local/www/apache24/data/test
[core:trace3] [pid 1861] request.c(120): [client 127.0.0.1:20266] auth phase 'check access' gave status 403: /test
[wsgi:info] [pid 1861] mod_wsgi (pid=1861): Create interpreter '127.0.0.1|/myapp'.
[wsgi:info] [pid 1861] [client 127.0.0.1:20266] mod_wsgi (pid=1861, process='', application='127.0.0.1|/myapp'): Loading Python script file '/usr/local/wsgi/scripts/myapp.wsgi'.
[http:trace3] [pid 1861] http_filters.c(1129): [client 127.0.0.1:20266] Response sent with status 200, headers:

If instead the documentroot is permissive, the middle portion looks like this

[authz_core:debug] [pid 1976] mod_authz_core.c(815): [client 127.0.0.1:10652] AH01626: authorization result of Require all granted: granted
[authz_core:debug] [pid 1976] mod_authz_core.c(815): [client 127.0.0.1:10652] AH01626: authorization result of <RequireAny>: granted
[core:trace3] [pid 1976] request.c(360): [client 127.0.0.1:10652] request authorized without authentication by access_checker_ex hook: /test

As to why mod_wsgi is poking unrelated and nonexistent paths under DocumentRoot, I do not know. I did not find mention of this in the documentation, or a setting that would toggle this behavior off. My first guess was that this was doing some sort of opportunistic passthrough for static content (otherwise easily achievable via AliasMatch), but even if the path does exist, the result is still the wsgi script's 'hello world' output, so I don't think my guess is right. Ultimately, whatever this behavior is, it results in an unpleasant amount of error log noise even though everything seems to be set up correctly.

theultramage avatar Apr 25 '23 10:04 theultramage

I would need to see the actual configuration you are using and not just the mod_wsgi bits, but also what DocumentRoot is set to and any active AddHandler, SetHandler, Alias directives. Show also whether these are inside of a VirtualHost, custom or default.

GrahamDumpleton avatar Apr 25 '23 10:04 GrahamDumpleton

As I said, everything is in its default state that apache installs with. DocumentRoot - "/usr/local/www/apache24/data" for freebsd, "/var/www/html" for ubuntu. Virtualhost - none on freebsd (using global httpd.conf), for ubuntu I tried putting the wsgi stuff both inside and outside of the default vhost, with no observable difference. Handlers/Aliases - none active (disabled default alias, mime and status on ubuntu just to test)

I used the tiniest amount of steps to reproduce this issue, and have done so successfully. I have used the sample code from the documentation, verbatim. The main thing that matters is to not leave unrestricted access to DocumentRoot and its subpaths, because that hides the issue.

DocumentRoot "/usr/local/www/apache24/data"
<Directory "/usr/local/www/apache24/data">
     ...
-    Require all granted
+    Require all denied
</Directory>

theultramage avatar Apr 25 '23 11:04 theultramage

Every Linux distro uses different default configurations for Apache, and they are different to standard Apache source distribution, so I am still more or less left guessing as I am not familiar with what each distro is currently using.

GrahamDumpleton avatar Apr 25 '23 11:04 GrahamDumpleton

Are you able to reproduce this in whatever test environment you have at hand? I suspect that the number of customizations doesn't matter. For starters, try setting LogLevel trace3, query an wsgi-aliased path and see if you get that secondary directory access. It is visible in the logs even in the positive case (see the log snippet above).

theultramage avatar Apr 25 '23 11:04 theultramage

I don't currently have a test environment where I could test it, which is in part why was trying to glean as much information about the configuration as possible. The only way I can probably even test anything right now is using mod_wsgi-express, but it uses its own generated configuration customized for mod_wsgi and so is not going to replicate your case.

GrahamDumpleton avatar Apr 25 '23 11:04 GrahamDumpleton

What happens if you add:

<Location /myapp/>
Require all granted
</Location>

GrahamDumpleton avatar Apr 25 '23 11:04 GrahamDumpleton

So I managed to fudge something up with mod_wsgi-express by mounting at a sub URL path using --mount-point option, then manually editing the generated httpd.conf, denying access to document root, and forcing restart using signal while not exiting mod_wsgi-express. From that test I believe the suggestion about using Location directive should eliminate the problem.

GrahamDumpleton avatar Apr 25 '23 11:04 GrahamDumpleton

Alright, give me some time and I'll try to do a msvc source build and track down the originating code. Was hoping it wouldn't have to come to that since it's time and energy consuming to set up, and the devs usually already have a debuggable dev environment set up and ready to go.

The above does not change anything, since the unexpected secondary access is using / (DocumentRoot) as the base. Granting access to / and every subpath does work (equivalent to granting it to the documentroot directory, which is the default configuration for apache). But I don't think production websites run in such a permissive mode. Also I need to ask, is this secondary directory access, that checks if the scripted path exists as a file under documentroot, supposed to be happening, as part of some builtin functionality? Or is it unintended?

You do not need to edit httpd.conf. If the documentroot allows overrides, a .htacess file can also work. And it doesn't even need to be a blanket deny, it can just deny the scripted subpaths as they would appear if they were redirected to documentroot. All that is just to make the issue more visible in the logs. Either apache or mod_wsgi is making these bogus access operation whether access is granted or not.

theultramage avatar Apr 25 '23 11:04 theultramage

The Location directive is not the same as Directory. The Location directive is saying anything under /myapp/ URL sub path is allowed. It is nothing to do with the file system and will only override things for your WSGI application mounted at /myapp/ URL path using WSGIScriptAlias, nothing else would be exposed. In my test at least it was enough to get rid of the auth errors.

GrahamDumpleton avatar Apr 25 '23 11:04 GrahamDumpleton

The following still logs an error on every access.

WSGIScriptAlias /myapp /usr/local/wsgi/scripts/myapp.wsgi
<Directory /usr/local/wsgi/scripts>
    Require all granted
</Directory>
<Location /myapp>
    Require all granted
</Location>
<Location /myapp/>
    Require all granted
</Location>

theultramage avatar Apr 25 '23 11:04 theultramage

If you are still seeing issues then use:

WSGIScriptAlias /myapp/ /usr/local/wsgi/scripts/myapp.wsgi

and not:

WSGIScriptAlias /myapp /usr/local/wsgi/scripts/myapp.wsgi

Does mean that people need to use /myapp/ URL sub path and not rely on auto trailing slash redirection.

GrahamDumpleton avatar Apr 25 '23 11:04 GrahamDumpleton

If I add that extra slash, WSGIScriptAlias treats the thing on the right as a directory path and directly appends the queried subpath, resulting in Target WSGI script not found or unable to stat: /usr/local/wsgi/scripts/myapp.wsgitest

theultramage avatar Apr 25 '23 11:04 theultramage

Then use:

WSGIScriptAlias /myapp/ /usr/local/wsgi/scripts/myapp.wsgi/

Forgot that except for root of site, if use trailing slash on LHS, must use it on RHS. I think the docs even mention this. :-)

GrahamDumpleton avatar Apr 25 '23 12:04 GrahamDumpleton

Well, it mentions it in relation to mapping to directories, but same should apply if targeting a file.

When targeting a file the docs say:

The last option to the directive in this case must be a full pathname to the actual code file containing the WSGI application. A trailing slash should never be added to the last option when it is referring to an actual file.

but that warning is more about when there is no trailing slash on the LHS. A trailing slash can though be added on RHS if not root of site and is added on LHS.

Generally you wouldn't need to do this, but for this case of trying to disable access to document root you do because am guessing the trailing slash redirection check is being applied against document root and thus why get access error. So need to avoid the trailing slash redirection step.

It is possible you may be able to use:

DirectorySlash Off

or being more specific:

WSGIScriptAlias /myapp /usr/local/wsgi/scripts/myapp.wsgi
<Location /myapp>
DirectorySlash Off
</Location>

You would have to play with it. The trailing slash stuff of Apache can be really stupid some times.

GrahamDumpleton avatar Apr 25 '23 12:04 GrahamDumpleton

Tried the above as well, it is still doing it. I'm assuming it's a coding bug, probably a really old one. I will compile, debug & step and capture events to locate the cause. It should be really easy to test, since I just need to watch for the error log write. Though it'll have to wait as I have already spent a day on this and now need to tend to other stuff.

theultramage avatar Apr 25 '23 12:04 theultramage

For these sorts of things it doesn't usually even get into mod_wsgi. It is caused by behaviours of mod_dir, mod_autoindex and mod_auth_??? modules in Apache.

BTW, where in order of Apache modules being loaded is the wsgi-module relative to these other modules?

The mod_wsgi code has to do some stupid stuff to set what relative order it wants things to happen.

static void wsgi_register_hooks(apr_pool_t *p)
{
    static const char * const p1[] = { "mod_alias.c", NULL };
    static const char * const n1[]= { "mod_userdir.c",
                                      "mod_vhost_alias.c", NULL };

    static const char * const n2[] = { "core.c", NULL };

#if !defined(MOD_WSGI_WITH_AUTHN_PROVIDER)
    static const char * const p3[] = { "mod_auth.c", NULL };
#endif
#if !defined(MOD_WSGI_WITH_AUTHZ_PROVIDER)
    static const char * const n4[] = { "mod_authz_user.c", NULL };
#endif
    static const char * const n5[] = { "mod_authz_host.c", NULL };

    static const char * const p6[] = { "mod_python.c", NULL };

    static const char * const p7[] = { "mod_ssl.c", NULL };

    ap_hook_post_config(wsgi_hook_init, p6, NULL, APR_HOOK_MIDDLE);
    ap_hook_child_init(wsgi_hook_child_init, p6, NULL, APR_HOOK_MIDDLE);

    ap_hook_translate_name(wsgi_hook_intercept, p1, n1, APR_HOOK_MIDDLE);
    ap_hook_handler(wsgi_hook_handler, NULL, NULL, APR_HOOK_MIDDLE);

#if defined(MOD_WSGI_WITH_DAEMONS)
    ap_hook_post_config(wsgi_hook_logio, NULL, n2, APR_HOOK_REALLY_FIRST);

    wsgi_header_filter_handle =
        ap_register_output_filter("WSGI_HEADER", wsgi_header_filter,
                                  NULL, AP_FTYPE_PROTOCOL);
#endif

#if !defined(MOD_WSGI_WITH_AUTHN_PROVIDER)
    ap_hook_check_user_id(wsgi_hook_check_user_id, p3, NULL, APR_HOOK_MIDDLE);
#else
    ap_register_provider(p, AUTHN_PROVIDER_GROUP, "wsgi",
                         AUTHN_PROVIDER_VERSION, &wsgi_authn_provider);
#endif
#if !defined(MOD_WSGI_WITH_AUTHZ_PROVIDER)
    ap_hook_auth_checker(wsgi_hook_auth_checker, NULL, n4, APR_HOOK_MIDDLE);
#else
    ap_register_provider(p, AUTHZ_PROVIDER_GROUP, "wsgi-group",
                         AUTHZ_PROVIDER_VERSION, &wsgi_authz_provider);
#endif
    ap_hook_access_checker(wsgi_hook_access_checker, p7, n5, APR_HOOK_MIDDLE);
}

If Apache has changed module names, or subtly change in what phase it does things, then could screw up the ordering.

In the case where there is no ordering dictated in mod_wsgi code, then precedence between different module handlers is dictated by the order in which they are loaded, so shifting where mod_wsgi module gets loaded can change things if this is now what is occurring.

GrahamDumpleton avatar Apr 25 '23 12:04 GrahamDumpleton

On ubuntu I disabled all configs and modules until I only had authz_core, mpm_prefork and wsgi. Also mod_dir is gone which avoids those automatic slashes you mentioned.

theultramage avatar Apr 25 '23 12:04 theultramage

Is quite possible I see differences because as I said I am using mod_wsgi-express. It doesn't actually use WSGIScriptAlias but a slightly different way using WSGIHandlerScript. I can't try and test with WSGIScriptAlias approach until tomorrow at the earliest.

GrahamDumpleton avatar Apr 25 '23 12:04 GrahamDumpleton

I have reproduced it on windows and am now able to debug it properly. The call stack is:

libhttpd.dll!ap_log_rerror_(...)  Line 1366
mod_authz_core.so!authorize_user_core(request_rec* r=0x00c97278, int after_authn=0)  Line 883 + 0x88 bytes
mod_authz_core.so!authorize_userless(request_rec* r=0x00c97278)  Line 917 + 0xb bytes
libhttpd.dll!ap_run_access_checker_ex(request_rec* r=0x00c97278)  Line 93 + 0x52 bytes
libhttpd.dll!ap_process_request_internal(request_rec* r=0x00c97278)  Line 339 + 0x9 bytes
libhttpd.dll!ap_sub_req_method_uri(const char* method="GET", const char* new_uri="/test", const request_rec* r=0x00c8ea58, ap_filter_t* next_filter=NULL)  Line 2289 + 0x9 bytes
libhttpd.dll!ap_sub_req_lookup_uri(const char* new_uri="/test", const request_rec* r=0x00c8ea58, ap_filter_t* next_filter=NULL)  Line 2302
libhttpd.dll!ap_add_cgi_vars(request_rec* r=0x00c8ea58)  Line 433 + 0x23 bytes
mod_wsgi.so!wsgi_build_environment(request_rec* r=0x00c8ea58)  Line 6290 + 0xc bytes
mod_wsgi.so!wsgi_hook_handler(request_rec* r=0x00c8ea58)  Line 7408 + 0x9 bytes
libhttpd.dll!ap_run_handler(request_rec* r=0x00c8ea58)  Line 170 + 0x52 bytes

The request structure contains:

unparsed_uri	"/myapp/test"
uri	"/myapp/test"
filename	"C:/httpd-2.4.57/Debug/wsgi-scripts/myapp.wsgi"
canonical_filename	"C:/httpd-2.4.57/Debug/wsgi-scripts/myapp.wsgi"
path_info	"/test"
used_path_info	AP_REQ_DEFAULT_PATH_INFO

The request is handled by mod_wsgi.c::wsgi_hook_handler(request_rec *r). It goes into mod_wsgi.c::wsgi_build_environment(request_rec *r). Then it calls util_script.c::ap_add_cgi_vars(request_rec *r). This function decides that e[REQUEST_URI] = /myapp/test, extracts e[SCRIPT_NAME] = /myapp, and sets e[PATH_INFO] = /test. Then it does this weird part:

    /* To get PATH_TRANSLATED, treat PATH_INFO as a URI path.
     * Need to re-escape it for this, since the entire URI was
     * un-escaped before we determined where the PATH_INFO began. */
     
>    pa_req = ap_sub_req_lookup_uri(ap_escape_uri(r->pool, r->path_info), r, NULL);
     char *pt = apr_pstrcat(r->pool, pa_req->filename, pa_req->path_info, NULL);
     apr_table_setn(e, "PATH_TRANSLATED", pt);

That just directly calls return ap_sub_req_method_uri("GET", new_uri, r, next_filter); where the uri is /test. This creates an internal subrequest to that path. Somewhere along the way, the fact that it is a subpath to an aliased script has been lost.

theultramage avatar Apr 26 '23 23:04 theultramage

Hi, I have investigated further. I have tried using mod_cgi and ScriptAlias instead, and it also reproduces the issue. The cause is indeed ap_add_cgi_vars(), when generating PATH_TRANSLATED. According to the CGI RFC https://www.rfc-editor.org/rfc/rfc3875#section-4.1.6:

The PATH_TRANSLATED variable is derived by taking the PATH_INFO value, parsing it as a local URI in its own right, and performing any virtual-to-physical translation appropriate to map it onto the server's document repository structure. The value is derived in this way irrespective of whether it maps to a valid repository location.

So the fact that it's generating a nonsensical url that doesn't exist, is intended. What isn't intended is the authz security checks and logged errors. The code does set the subrequest status to 403 HTTP_FORBIDDEN, but the caller doesn't care and just wants the resultant file path. I'll go report this to apache httpd. You can close this I guess.

theultramage avatar Apr 28 '23 09:04 theultramage

What could perhaps be done is to duplicate code for ap_add_cgi_vars() into mod_wsgi code base and remove the bits that aren't needed. It was used for convenience, but various things in there probably aren't required. Thus can bypass that stuff related to PATH_TRANSLATED. Even ap_add_common_vars() could perhaps be duplicated and similarly drop stuff that isn't needed.

GrahamDumpleton avatar Apr 28 '23 09:04 GrahamDumpleton

I found https://bz.apache.org/bugzilla/show_bug.cgi?id=43666 which proves that the issue existed at least since 2007. I first came across it in the early 2010's when switching from mod_python to mod_wsgi, and, not knowing what was wrong, fiddled with the access permissions until the errors went away.

I added my findings to that bugreport, and I will try to bring it to the developers' attention. It likely affects anything cgi-related.

theultramage avatar Apr 28 '23 10:04 theultramage