unit icon indicating copy to clipboard operation
unit copied to clipboard

Feature Request: Support Separating `SCRIPT_NAME` and `PATH_INFO` in Python

Open OutOfFocus4 opened this issue 3 years ago • 8 comments

I am requesting that Python module be updated to accept an optional script-name variable that is passed to WSGI applications in the SCRIPT_NAME environment variable, and that the PATH_INFO environment variable be updated to not start with the script-name path.

Currently, WSGI applications in Nginx Unit are passed the full request path in the PATH_INFO variable, even if the application can only be accessed on non-root paths. This differs from the behavior of Apache's mod_wsgi, which puts part of the path in SCRIPT_NAME and the remaining part in PATH_INFO.

The behavior of mod_wsgi means that a Django application only needs to be designed to handle its own URL routes, such as /admin/, but can be accessed using Apache at paths such as /django-app/admin/ without any changes to the application.

Because Nginx Unit puts the full path in the PATH_INFO variable, a Django application accessible at a non-root path such as /django must be configured to strip /django from the path before determining how to route the request, and if Unit is reconfigured to place the application under /app, the Django application would also need to be updated.

OutOfFocus4 avatar Oct 23 '21 15:10 OutOfFocus4

Do you know how it's implemented in Gunicorn?

VBart avatar Oct 26 '21 12:10 VBart

Because Nginx Unit puts the full path in the PATH_INFO variable, a Django application accessible at a non-root path such as /django must be configured to strip /django from the path before determining how to route the request, and if Unit is reconfigured to place the application under /app, the Django application would also need to be updated.

Note also, that you can avoid the need of updating app sources in this case by providing a value to Django using the environment variable configured in environment option in Unit: https://unit.nginx.org/configuration/#configuration-apps-common

VBart avatar Oct 26 '21 13:10 VBart

Do you know how it's implemented in Gunicorn?

I haven't seen anything in the documentation or tutorials about having a single Gunicorn instance serve multiple applications at non-root paths.

Because Nginx Unit puts the full path in the PATH_INFO variable, a Django application accessible at a non-root path such as /django must be configured to strip /django from the path before determining how to route the request, and if Unit is reconfigured to place the application under /app, the Django application would also need to be updated.

Note also, that you can avoid the need of updating app sources in this case by providing a value to Django using the environment variable configured in environment option in Unit: https://unit.nginx.org/configuration/#configuration-apps-common

This is true, but currently the environment option only affects system environment variables, not WSGI environment variables. This means that any Django application that currently works with Apache and mod_wsgi would need to be updated to:

  1. Check os.environ for the script name
  2. Call Django's set_script_prefix to set the script name globally
  3. Strip the script name from the beginning of the WSGI environment's PATH_INFO so the request routes to the proper view
  4. Continue on as normal

Implementing my request (or something similar) would allow applications to be run under both Nginx Unit and Apache with no modifications.

OutOfFocus4 avatar Oct 26 '21 14:10 OutOfFocus4

Implementing my request (or something similar) would allow applications to be run under both Nginx Unit and Apache with no modifications.

Sure, understood.

But what I dislike with SCRIPT_NAME approach is that it adds additional WSGI var, that has to be passed with every request, while basically in most cases it's just constant and can be set once on the initialization stage of the app.

VBart avatar Oct 26 '21 14:10 VBart

Implementing my request (or something similar) would allow applications to be run under both Nginx Unit and Apache with no modifications.

Sure, understood.

But what I dislike with SCRIPT_NAME approach is that it adds additional WSGI var, that has to be passed with every request, while basically in most cases it's just constant and can be set once on the initialization stage of the app.

Django checks for the SCRIPT_NAME variable on every request, whether the WSGI variable is set or not. My bigger concern is stripping the script prefix from the path so Django can route it properly. This will have to be done either at the server level or at the application level, and I think doing it at the server level improves portability.

OutOfFocus4 avatar Oct 26 '21 14:10 OutOfFocus4

For context, I am part of a team that develops applications for internal use at my organization. These applications are developed using Django and are served by a single Apache Web Server instance using mod_wsgi.

When developing a Django application, a development server is used and the application is served from the root path of localhost, but the application is served from a non-root path in production using the WSGIScriptAlias directive, which mod_wsgi uses to set the SCRIPT_NAME and removes it from the start of PATH_INFO so the Django application doesn't have to do it.

My team considered migrating from Apache to Unit because it would make migrating projects to newer Python versions easier, but the inability to host our applications without modification is preventing us from doing that.

If this feature (or something similar) is not added, I believe the Unit documentation should be updated to warn that Django applications cannot be served from non-root paths without modification.

OutOfFocus4 avatar Nov 08 '21 15:11 OutOfFocus4

We've implemented this with a subclass of WSGIHandler. Not ideal, as WSGIHandler is not suppose to be public...

import django
from django.core.handlers.wsgi import WSGIHandler

class TBWSGIHandler(WSGIHandler):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def _set_script_name_path_info(self, environ):
        '''Set the script_name and path_info

        Before:
          PATH_INFO=/one/two/three

        After:
          SCRIPT_NAME=/one
          PATH_INFO=/two/three
        '''
        split_path = environ['PATH_INFO'].split('/')
        environ['SCRIPT_NAME'] = '/'.join(split_path[:2])
        environ['PATH_INFO'] = '/'+'/'.join(split_path[2:])
        
    def __call__(self, environ, start_response):
        self._set_script_name_path_info(environ)
        return super().__call__(environ, start_response)

def get_tb_wsgi_application():
    """
    TB Shim around django wsgi handler
    """
    django.setup(set_prefix=False)
    return TBWSGIHandler()

krburkhart avatar Nov 11 '21 00:11 krburkhart

We've implemented this with a subclass of WSGIHandler. Not ideal, as WSGIHandler is not suppose to be public...

import django
from django.core.handlers.wsgi import WSGIHandler

class TBWSGIHandler(WSGIHandler):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def _set_script_name_path_info(self, environ):
        '''Set the script_name and path_info

        Before:
          PATH_INFO=/one/two/three

        After:
          SCRIPT_NAME=/one
          PATH_INFO=/two/three
        '''
        split_path = environ['PATH_INFO'].split('/')
        environ['SCRIPT_NAME'] = '/'.join(split_path[:2])
        environ['PATH_INFO'] = '/'+'/'.join(split_path[2:])
        
    def __call__(self, environ, start_response):
        self._set_script_name_path_info(environ)
        return super().__call__(environ, start_response)

def get_tb_wsgi_application():
    """
    TB Shim around django wsgi handler
    """
    django.setup(set_prefix=False)
    return TBWSGIHandler()

That would work for most of our applications, but this code only uses the first path segment as the script name, and we have some cases where we want the script name to consist of multiple path segments (ex. /hr/forms/ serves the static file /hr/forms/index.html containing links to hiring and time-off forms, but /hr/forms/time-off/ and /hr/forms/hiring/ are both Django applications).

There are other workarounds we could use in those applications, and we may end up using them if we ultimately decide to migrate to Unit, but the ability to serve multiple Django applications from a single server (using different Python versions, even!) was a major reason we started investigating Unit in the first place.

The fact that multiple out-of-the-box Django applications cannot be served by a single server should at least be mentioned in the documentation so newcomers know about it from the beginning.

OutOfFocus4 avatar Nov 11 '21 12:11 OutOfFocus4