unit icon indicating copy to clipboard operation
unit copied to clipboard

Add "prefix" to Python configuration

Open OutOfFocus4 opened this issue 3 years ago • 29 comments

Updated version of #595, fixes #590

Allow configurations for Python applications to accept an optional script_name parameter. If set, the value must be either a string beginning with /, or an array of such strings. If not set, Unit will behave as before.

The script_name is used in the following ways:

  1. If script_name is a string and a request is handled whose path begins with the value of script_name:
    • In ASGI applications, the key-value pair "root_path": "<script_name>" is added to the scope dict
    • In WSGI applications, the key-value pair "SCRIPT_NAME": "<script_name>" is added to the environ dict, and the PATH_INFO in environ is set to all characters after those in script_name, or / if the request path is identical to script_name
  2. If script_name is an array of string, each it is iterated for each request, checking if the request path begins with any of its member strings. If a match is found:
    • In ASGI applications, the key-value pair "root_path": "<match>" is added to the scope dict
    • In WSGI applications, the key-value pair "SCRIPT_NAME": "<match>" is added to the environ dict, and the PATH_INFO in environ is set to all characters after those in match, or / if the request path is identical to match

Array elements are checked in the order they appear in the configuration JSON.

The script_name configuration variable accepts an empty string or empty array (but not an array of empty strings). In this case, it is ignored.

OutOfFocus4 avatar Nov 16 '21 23:11 OutOfFocus4

@VBart, I think this solves the use cases you brought up in the original pull request:

Thanks for the pull request. But it seems that this approach with setting static "script_name" path will limit possible use cases. E.g. what if some app needs to be provided on multiple web-sites via different URIs? Or even part of the app needs to be put on /, but other part should have some /prefix/?

  1. In the case of serving different paths for different websites, all of those paths could be in the script_name array.
  2. If both / and /prefix need to route to the same application, settings script_name to the string /prefix would be sufficient. Requests to / would not set the SCRIPT_NAME variable in environ, while requests to /prefix would remove /prefix from the request path and set the SCRIPT_NAME before calling the WSGI application.

What do you think? Are there any features I need to add before this pull request can be merged?

OutOfFocus4 avatar Nov 26 '21 14:11 OutOfFocus4

@OutOfFocus4 Using array here doesn't look convenient, as you have to iterate over array to find the right prefix and a user of array has to think about the ordering, how to avoid collisions. Also there are measures need to be made to separate these apps, e.g. on example.com there need to be prefix /pref1/, but on example.org, the prefix need to be /pref2/. Because they configured in the same array and applied sequentially to the same request, there's a possibility, that /pref2/ will be served for .com or vice versa /pref1/ will be served for .org, if it won't be properly filtered before.

We already have a way how to separate apps in one process, called targets: https://unit.nginx.org/configuration/#configuration-python-targets

I wonder, why did you avoid allowing configuring script_name in targets and rather went with that cumbersome array approach?

Also note the "fastcgi_split_path_info" directive in nginx, which gives wide flexibility of setting SCRIPT_NAME and PATH_INFO.

We also should think about a better name for the directive. Because script_name is just a mapping of CGI variable SCRIPT_NAME and has several problems:

  1. It's not very intuitive if you aren't familiar with CGI variables and their interpretation in Python
  2. It does more than just setting SCRIPT_NAME, while having the same name
  3. More and more users use ASGI, and for them root_path more familiar

We have to carefully evaluate every feature and its configuration. Because, once the decision is made and feature is implemented, then it's hard to change as users will start using it and any incompatible change will affect many setups.

I'd like to include this feature in Unit 1.27.0, but it needs to be polished as much as possible.

VBart avatar Nov 27 '21 01:11 VBart

@VBart

You are correct that script_name is a poor label for this directive, given its use cases. How about path_prefix or application_prefix?

It never occurred to me to use targets. What do you think of this?

  • The top-level python application object and members of the targets object will accept a path_prefix.
    • path_prefix can be either a static string beginning with "/" or a regular expression with exactly two capture groups. Static strings are internally converted to regular expressions.
    • targets with no path_prefix will use the top-level path_prefix if it exists.
    • For example, in the following application definition, the path_prefix of foo is "/", and "/bar" for the bar application:
{
    "applications": {
        "python-app": {
            "type": "python",
            "path": "/www/apps/python-app/",
            "path_prefix": "/bar",
            "targets": {
                "foo": {
                    "module": "foo.wsgi",
                    "callable": "foo",
                    "path_prefix": "/",
                },

                "bar": {
                    "module": "bar.wsgi",
                    "callable": "bar"
                }
            }
        }
    }
}
  • When handling a request to a Python application, the target is checked for a non-empty path_prefix. If the target has a path_prefix, it is compared to the request's path. If a match is found, the contents of the first capture group are used as the SCRIPT_NAME (for WSGI applications) or root_path (for ASGI applications). Additionally, in WSGI applications, the contents of the second capture group are used as the PATH_INFO, unless the second capture group is empty (for example, path_prefix is ^(/api)(/?.*)$ and the request path is /api), in which case PATH_INFO is "/".
  • If neither the targets nor the top-level application use the path_prefix directive, or the path_prefix regular expression does not match the request path, requests are handled like in 1.26.0

OutOfFocus4 avatar Nov 27 '21 13:11 OutOfFocus4

@OutOfFocus4 About the name:

  • application_prefix when put inside /applications/NAME/application_prefix repeats "application" word again. All options inside /applications/NAME/ already related to the application and there is no need to repeat it, otherwise we would add applicaton_ prefix almost to any option there. Also it's a quite long option name.
  • path_prefix has undesirable interference with the path option, which is about file system path, while the discussed option is about URI parts. So, uri_prefix will look better than path_prefix, but I suggest to use just prefix.

Another idea: uri_split, or prefix_split? But maybe just prefix is good enough. What do you think?

Currently Unit doesn't inherit any options into targets, because it feels like better to specify explicitly each parameter, even if it requires some copy'n'paste, than deal with complexity of inheritance and having to check what targets will be affected when something is changed on the top-level. Currently Unit just prohibits specifying target parameters on the top-level when targets option is used. Maybe we will change this someday and allow also on the top-level with inheritance, if there will be such requests, but currently I suggest to follow the same validation rules as with other targets options.

Also I suggest more clean distinction of static value from a regexp one. I suggest to use the same marker as used in the match patterns: ~. When the value starts with ~, then it will be interpreted as regexp.

Converting static strings into regexps isn't a good idea, it's better to do splitting in pure C, because:

  • Regexp has complex escaping rules and you will have to follow such rules in conversion, e.g. escape all dots, as . in regexp has special meaning. So, the conversion may be even more complex than pure implementation in C.
  • Unit can be built without PCRE dependency and hence without all the regexp functionality. It will be sad to lost any ability to set prefix in this case.

It will be a good idea to implement this option without regexp support first, and then as another pull request/patch enhance its functionality with regexps. You may even postpone this regexp enhancement unless there will be some feature requests about it. Thus by the time of implementation we will better understand the particular use case for regexp.

Also, it seems to me there's no need to require 2 capture groups for ASGI apps. The second one won't be used anyway.

VBart avatar Nov 27 '21 17:11 VBart

@VBart I updated the pull request. The changes to Unit are as follows:

  1. Added the prefix directive to Python applications and targets. If set, it must be a string beginning with "/". 1a. Like module and callable, prefix can only be used at the top-level if targets isn't used. 1b. Each target in targets has its own prefix; they are not shared or inherited in any way.
  2. When a Python application handles a request, if a prefix exists for the target or top-level application and the request's path begins with the same characters as prefix, then: 2a. For WSGI applications, set SCRIPT_NAME to prefix and PATH_INFO to all characters in path after the characters in prefix, or "/" if path and prefix contain the same characters 2b. For ASGI applications, set root_path in the scope dictionary to prefix

OutOfFocus4 avatar Nov 27 '21 23:11 OutOfFocus4

@OutOfFocus4 What about these two configs:

  1.  {
         "prefix": "/blog"
     }
    
  2.  {
         "prefix": "/blog/"
     }
    

?

I think it will be more intuitive and less confusing if they will work equally. Also, looking at your code I guess the first configuration for a request like: GET /blogger/something will set SCRIPT_NAME as /blog and PATH_INFO as ger/something, which doesn't look right.

I suggest to add ending / to prefix at the configuration stage if there's no one, but include it in the PATH_INFO part.

Note that according to CGI spec, PATH_INFO should either start with a / or be empty: https://datatracker.ietf.org/doc/html/rfc3875#section-4.1.5

And I'm pretty sure root_path can't point to the middle of URI segment as well.

VBart avatar Nov 28 '21 12:11 VBart

@VBart Thanks for catching that!

I updated the code to fix this. When setting up nxt_py_targets, any target with a prefix not ending in / will have / appended to the prefix.

When comparing the target's prefix to a request path, the process is now:

  1. Check if path is longer than or as long as prefix and the characters in prefix are at the beginning of path
  2. Check if prefix is one character longer than path and the characters in path are at the beginning of prefix

If (1) is true:

  • For ASGI applications, set root_path to prefix minus the trailing /
  • If WSGI, set SCRIPT_NAME prefix except for the trailing / and set PATH_INFO to all characters in path after those in prefix, but with a leading /.

If (2) is true:

  • For ASGI applications, set root_path to prefix minus the trailing /
  • For WSGI applications, set SCRIPT_NAME to prefix except for the trailing / and set PATH_INFO to /.

A prefix of "/" is pointless, since it makes SCRIPT_NAME and root_path the empty string and doesn't affect PATH_INFO.

OutOfFocus4 avatar Nov 28 '21 14:11 OutOfFocus4

Note that according to CGI spec, PATH_INFO should either start with a / or be empty: https://datatracker.ietf.org/doc/html/rfc3875#section-4.1.5

I just noticed this part of your response, and I updated the pull request again.

For WSGI applications, if path is equal to prefix without a trailing /, PATH_INFO is set to the empty string rather than /.

OutOfFocus4 avatar Nov 28 '21 17:11 OutOfFocus4

@VBart What do you think? Is there anything I should change?

OutOfFocus4 avatar Dec 02 '21 17:12 OutOfFocus4

I'll review the code later, probably next week. Currently is busy with 1.26.1 bugfix release.

VBart avatar Dec 02 '21 18:12 VBart

@VBart

Thanks for the feedback!

I updated the code. What do you think?

OutOfFocus4 avatar Dec 26 '21 15:12 OutOfFocus4

@OutOfFocus4

Thank you providing tests for the PR! Please apply following patch (added few more tests and style fixes):

# User Andrei Zeliankou <[email protected]>
# Date 1642600956 0
#      Wed Jan 19 14:02:36 2022 +0000
# Node ID 86bfec59c8a6312d2703b0079424b220fedd2774
# Parent  26cb1b12eb7f591ded7bf380d5798f0c0e076ae5
[mq]: prefix_fix

diff --git a/test/python/prefix/asgi.py b/test/python/prefix/asgi.py
--- a/test/python/prefix/asgi.py
+++ b/test/python/prefix/asgi.py
@@ -8,8 +8,7 @@ async def application(scope, receive, se
             'headers': [
                 (b'content-length', b'0'),
                 (b'prefix', scope.get('root_path', 'NULL').encode()),
-                (b'request-uri', scope['path'].encode()),
-            ]
+            ],
         }
     )

diff --git a/test/python/prefix/wsgi.py b/test/python/prefix/wsgi.py
--- a/test/python/prefix/wsgi.py
+++ b/test/python/prefix/wsgi.py
@@ -3,8 +3,8 @@ def application(environ, start_response)
         '200',
         [
             ('Content-Length', '0'),
-            ('Script-Name', environ.get('SCRIPT_NAME', 'No Script Name')),
-            ('Path-Info', environ['PATH_INFO'])
-        ]
+            ('Script-Name', environ.get('SCRIPT_NAME', 'NULL')),
+            ('Path-Info', environ['PATH_INFO']),
+        ],
     )
-    return []
\ No newline at end of file
+    return []
diff --git a/test/python/targets/asgi.py b/test/python/targets/asgi.py
--- a/test/python/targets/asgi.py
+++ b/test/python/targets/asgi.py
@@ -32,8 +32,7 @@ async def application_prefix(scope, rece
             'headers': [
                 (b'content-length', b'0'),
                 (b'prefix', scope.get('root_path', 'NULL').encode()),
-                (b'request-uri', scope['path'].encode()),
-            ]
+            ],
         }
     )

diff --git a/test/test_asgi_application.py b/test/test_asgi_application.py
--- a/test/test_asgi_application.py
+++ b/test/test_asgi_application.py
@@ -70,60 +70,39 @@ custom-header: BLAH
     def test_asgi_application_prefix(self):
         self.load('prefix', prefix='/api/rest')

-        resp = self.get(url='/api/rest/get')
-        assert (
-            resp['headers']['prefix'] == '/api/rest'
-        ), 'prefix header'
-        assert (
-            resp['headers']['request-uri'] == '/api/rest/get'
-        ), 'path-info header'
-
-    def test_asgi_application_prefix_ignore_separator(self):
-        self.load('prefix', prefix='/api/rest/')
+        def set_prefix(prefix):
+            self.conf('"' + prefix + '"', 'applications/prefix/prefix')

-        resp = self.get(url='/api/rest')
-        assert (
-            resp['headers']['prefix'] == '/api/rest'
-        ), 'prefix header'
-        assert (
-            resp['headers']['request-uri'] == '/api/rest'
-        ), 'path-info header'
-
-    def test_asgi_application_prefix_root_path(self):
-        self.load('prefix', prefix='/')
-
-        resp = self.get(url='/')
+        def check_prefix(url, prefix):
+            resp = self.get(url=url)
+            assert resp['status'] == 200
+            assert resp['headers']['prefix'] == prefix

-        assert resp['status'] == 200
-        assert resp['headers']['prefix'] == 'NULL'
-        assert resp['headers']['request-uri'] == '/'
-
-        resp = self.get(url='/app')
-
-        assert resp['status'] == 200
-        assert resp['headers']['prefix'] == 'NULL'
-        assert resp['headers']['request-uri'] == '/app'
-
-    def test_asgi_application_prefix_no_match(self):
-        self.load('prefix', prefix='/app')
-
-        resp = self.get(url='/api/')
+        check_prefix('/ap', 'NULL')
+        check_prefix('/api', 'NULL')
+        check_prefix('/api/', 'NULL')
+        check_prefix('/api/res', 'NULL')
+        check_prefix('/api/restful', 'NULL')
+        check_prefix('/api/rest', '/api/rest')
+        check_prefix('/api/rest/', '/api/rest')
+        check_prefix('/api/rest/get', '/api/rest')
+        check_prefix('/api/rest/get/blah', '/api/rest')

-        assert (
-            resp['headers']['prefix'] == 'NULL'
-        ), 'asgi prefix header'
-        assert (
-            resp['headers']['request-uri'] == '/api/'
-        ), 'asgi request-uri header'
+        set_prefix('/api/rest/')
+        check_prefix('/api/rest', '/api/rest')
+        check_prefix('/api/restful', 'NULL')
+        check_prefix('/api/rest/', '/api/rest')
+        check_prefix('/api/rest/blah', '/api/rest')

-        resp = self.get(url='/application/')
+        set_prefix('/app')
+        check_prefix('/ap', 'NULL')
+        check_prefix('/app', '/app')
+        check_prefix('/app/', '/app')
+        check_prefix('/application/', 'NULL')

-        assert (
-            resp['headers']['prefix'] == 'NULL'
-        ), 'asgi prefix header'
-        assert (
-            resp['headers']['request-uri'] == '/application/'
-        ), 'asgi request-uri header'
+        set_prefix('/')
+        check_prefix('/', 'NULL')
+        check_prefix('/app', 'NULL')

     def test_asgi_application_query_string_space(self):
         self.load('query_string')
diff --git a/test/test_asgi_targets.py b/test/test_asgi_targets.py
--- a/test/test_asgi_targets.py
+++ b/test/test_asgi_targets.py
@@ -96,31 +96,41 @@ class TestASGITargets(TestApplicationPyt
                 "1": {
                     "module": "asgi",
                     "callable": "application_prefix",
-                    "prefix": "/1/"
+                    "prefix": "/1/",
                 },
                 "2": {
                     "module": "asgi",
                     "callable": "application_prefix",
                     "prefix": "/api",
-                }
+                },
             }
         )
-        self.conf_post(
-            {
-                "match": {"uri": "*"},
-                "action": {"pass": "applications/targets/2"},
-            },
+        self.conf(
+            [
+                {
+                    "match": {"uri": "/1*"},
+                    "action": {"pass": "applications/targets/1"},
+                },
+                {
+                    "match": {"uri": "*"},
+                    "action": {"pass": "applications/targets/2"},
+                },
+            ],
             "routes",
         )
-        resp = self.get(url='/1')
-        assert resp['status'] == 200
-        assert resp['headers']['prefix'] == '/1'
-        resp = self.get(url='/api/test/')
-        assert resp['status'] == 200
-        assert resp['headers']['prefix'] == '/api'
-        resp = self.get(url='/2')
-        assert resp['status'] == 200
-        assert resp['headers']['prefix'] == 'NULL'
-        resp = self.get(url='/apis/')
-        assert resp['status'] == 200
-        assert resp['headers']['prefix'] == 'NULL'
+
+        def check_prefix(url, prefix):
+            resp = self.get(url=url)
+            assert resp['status'] == 200
+            assert resp['headers']['prefix'] == prefix
+
+        check_prefix('/1', '/1')
+        check_prefix('/11', 'NULL')
+        check_prefix('/1/', '/1')
+        check_prefix('/', 'NULL')
+        check_prefix('/ap', 'NULL')
+        check_prefix('/api', '/api')
+        check_prefix('/api/', '/api')
+        check_prefix('/api/test/', '/api')
+        check_prefix('/apis', 'NULL')
+        check_prefix('/apis/', 'NULL')
diff --git a/test/test_configuration.py b/test/test_configuration.py
--- a/test/test_configuration.py
+++ b/test/test_configuration.py
@@ -325,7 +325,7 @@ class TestConfiguration(TestControl):
                     "match": {"uri": "/app/*"},
                     "action": {"pass": "applications/sub-app"},
                 }
-            ]
+            ],
         }

         assert 'success' in self.conf(conf)
@@ -338,15 +338,12 @@ class TestConfiguration(TestControl):
                     "processes": {"spare": 0},
                     "path": "/app",
                     "targets": {
-                        "foo": {
-                            "module": "foo.wsgi",
-                            "prefix": "/app",
-                        },
+                        "foo": {"module": "foo.wsgi", "prefix": "/app"},
                         "bar": {
                             "module": "bar.wsgi",
                             "callable": "bar",
                             "prefix": "/api",
-                        }
+                        },
                     },
                 }
             },
@@ -360,7 +357,7 @@ class TestConfiguration(TestControl):
                     "match": {"uri": "/api/*"},
                     "action": {"pass": "applications/sub-app/bar"},
                 },
-            ]
+            ],
         }

         assert 'success' in self.conf(conf)
@@ -379,11 +376,7 @@ class TestConfiguration(TestControl):
             "listeners": {"*:7080": {"pass": "applications/sub-app"}},
         }

-        resp = self.conf(conf)
-        assert 'error' in resp
-        assert resp['detail'] == (
-            'The "prefix" must be a string beginning with "/".'
-        )
+        assert 'error' in self.conf(conf)

     def test_json_application_empty_python_prefix(self):
         conf = {
@@ -399,11 +392,7 @@ class TestConfiguration(TestControl):
             "listeners": {"*:7080": {"pass": "applications/sub-app"}},
         }

-        resp = self.conf(conf)
-        assert 'error' in resp
-        assert resp['detail'] == (
-            'The "prefix" must be a string beginning with "/".'
-        )
+        assert 'error' in self.conf(conf)

     def test_json_application_many2(self):
         conf = {
diff --git a/test/test_python_application.py b/test/test_python_application.py
--- a/test/test_python_application.py
+++ b/test/test_python_application.py
@@ -94,69 +94,43 @@ custom-header: BLAH
             resp['headers']['Query-String'] == ' var1= val1 & var2=val2'
         ), 'Query-String space 4'

-    def test_python_application_prefix_strip(self):
-        self.load('prefix', prefix='/test')
-
-        resp = self.get(url='/test/')
-
-        assert resp['status'] == 200, 'script name strip status'
-        assert (
-            resp['headers']['Script-Name'] == '/test'
-        ), 'script name strip script name'
-        assert (
-            resp['headers']['Path-Info'] == '/'
-        ), 'script name strip path info'
-
-    def test_python_application_prefix_segment_match(self):
-        self.load('prefix', prefix='/blog')
-
-        resp = self.get(url='/blogger/')
-
-        assert resp['status'] == 200
-        assert resp['headers']['Script-Name'] == 'No Script Name'
-        assert resp['headers']['Path-Info'] == '/blogger/'
-
-    def test_python_application_prefix_ignore_separator(self):
-        self.load('prefix', prefix='/test/')
-
-        resp = self.get(url='/test')
-
-        assert resp['status'] == 200
-        assert resp['headers']['Script-Name'] == '/test'
-        assert resp['headers']['Path-Info'] == ''
-
-    def test_python_application_prefix_root_path(self):
-        self.load('prefix', prefix='/')
-
-        resp = self.get(url='/django-app')
-
-        assert resp['status'] == 200
-        assert resp['headers']['Script-Name'] == 'No Script Name'
-        assert resp['headers']['Path-Info'] == '/django-app'
-
-        resp = self.get(url='/')
-
-        assert resp['status'] == 200
-        assert resp['headers']['Script-Name'] == 'No Script Name'
-        assert resp['headers']['Path-Info'] == '/'
-
-    def test_python_application_prefix_empty_path(self):
-        self.load('prefix', prefix='/test')
-
-        resp = self.get(url='/test')
-
-        assert resp['status'] == 200
-        assert resp['headers']['Script-Name'] == '/test'
-        assert resp['headers']['Path-Info'] == ''
-
-    def test_python_application_prefix_no_match(self):
+    def test_python_application_prefix(self):
         self.load('prefix', prefix='/api/rest')

-        resp = self.get(url='/api/get')
+        def set_prefix(prefix):
+            self.conf('"' + prefix + '"', 'applications/prefix/prefix')
+
+        def check_prefix(url, script_name, path_info):
+            resp = self.get(url=url)
+            assert resp['status'] == 200
+            assert resp['headers']['Script-Name'] == script_name
+            assert resp['headers']['Path-Info'] == path_info

-        assert resp['status'] == 200
-        assert resp['headers']['Script-Name'] == 'No Script Name'
-        assert resp['headers']['Path-Info'] == '/api/get'
+        check_prefix('/ap', 'NULL', '/ap')
+        check_prefix('/api', 'NULL', '/api')
+        check_prefix('/api/', 'NULL', '/api/')
+        check_prefix('/api/res', 'NULL', '/api/res')
+        check_prefix('/api/restful', 'NULL', '/api/restful')
+        check_prefix('/api/rest', '/api/rest', '')
+        check_prefix('/api/rest/', '/api/rest', '/')
+        check_prefix('/api/rest/get', '/api/rest', '/get')
+        check_prefix('/api/rest/get/blah', '/api/rest', '/get/blah')
+
+        set_prefix('/api/rest/')
+        check_prefix('/api/rest', '/api/rest', '')
+        check_prefix('/api/restful', 'NULL', '/api/restful')
+        check_prefix('/api/rest/', '/api/rest', '/')
+        check_prefix('/api/rest/blah', '/api/rest', '/blah')
+
+        set_prefix('/app')
+        check_prefix('/ap', 'NULL', '/ap')
+        check_prefix('/app', '/app', '')
+        check_prefix('/app/', '/app', '/')
+        check_prefix('/application/', 'NULL', '/application/')
+
+        set_prefix('/')
+        check_prefix('/', 'NULL', '/')
+        check_prefix('/app', 'NULL', '/app')

     def test_python_application_query_string_empty(self):
         self.load('query_string')
diff --git a/test/test_python_targets.py b/test/test_python_targets.py
--- a/test/test_python_targets.py
+++ b/test/test_python_targets.py
@@ -54,13 +54,13 @@ class TestPythonTargets(TestApplicationP
                 "listeners": {"*:7080": {"pass": "routes"}},
                 "routes": [
                     {
-                        "match": {"uri": ["/app", "/app/*"]},
+                        "match": {"uri": "/app*"},
                         "action": {"pass": "applications/targets/app"},
                     },
                     {
                         "match": {"uri": "*"},
                         "action": {"pass": "applications/targets/catchall"},
-                    }
+                    },
                 ],
                 "applications": {
                     "targets": {
@@ -82,26 +82,20 @@ class TestPythonTargets(TestApplicationP
                             },
                         },
                     }
-                }
+                },
             }
         )

-        resp = self.get(url='/app')
-        assert resp['status'] == 200
-        assert resp['body'] == '/app '
-
-        resp = self.get(url='/app/rest/user/')
-        assert resp['status'] == 200
-        assert resp['body'] == '/app /rest/user/'
+        def check_prefix(url, body):
+            resp = self.get(url=url)
+            assert resp['status'] == 200
+            assert resp['body'] == body

-        resp = self.get(url='/catchall')
-        assert resp['status'] == 200
-        assert resp['body'] == 'No Script Name /catchall'
-
-        resp = self.get(url='/apis')
-        assert resp['status'] == 200
-        assert resp['body'] == 'No Script Name /apis'
-
-        resp = self.get(url='/api/users/')
-        assert resp['status'] == 200
-        assert resp['body'] == '/api /users/'
+        check_prefix('/app', '/app ')
+        check_prefix('/app/', '/app /')
+        check_prefix('/app/rest/user/', '/app /rest/user/')
+        check_prefix('/catchall', 'No Script Name /catchall')
+        check_prefix('/api', '/api ')
+        check_prefix('/api/', '/api /')
+        check_prefix('/apis', 'No Script Name /apis')
+        check_prefix('/api/users/', '/api /users/')

andrey-zelenkov avatar Jan 19 '22 14:01 andrey-zelenkov

@andrey-zelenkov @VBart I applied the patch. Anything else?

OutOfFocus4 avatar Jan 20 '22 17:01 OutOfFocus4

@OutOfFocus4 Please apply few more diff's from my last patch:

diff --git a/test/python/targets/asgi.py b/test/python/targets/asgi.py
--- a/test/python/targets/asgi.py
+++ b/test/python/targets/asgi.py
@@ -32,8 +32,7 @@ async def application_prefix(scope, rece
             'headers': [
                 (b'content-length', b'0'),
                 (b'prefix', scope.get('root_path', 'NULL').encode()),
-                (b'request-uri', scope['path'].encode()),
-            ]
+            ],
         }
     )
 
diff --git a/test/test_python_targets.py b/test/test_python_targets.py
--- a/test/test_python_targets.py
+++ b/test/test_python_targets.py
@@ -54,13 +54,13 @@ class TestPythonTargets(TestApplicationP
                 "listeners": {"*:7080": {"pass": "routes"}},
                 "routes": [
                     {
-                        "match": {"uri": ["/app", "/app/*"]},
+                        "match": {"uri": "/app*"},
                         "action": {"pass": "applications/targets/app"},
                     },
                     {
                         "match": {"uri": "*"},
                         "action": {"pass": "applications/targets/catchall"},
-                    }
+                    },
                 ],
                 "applications": {
                     "targets": {
@@ -82,26 +82,20 @@ class TestPythonTargets(TestApplicationP
                             },
                         },
                     }
-                }
+                },
             }
         )
 
-        resp = self.get(url='/app')
-        assert resp['status'] == 200
-        assert resp['body'] == '/app '
-
-        resp = self.get(url='/app/rest/user/')
-        assert resp['status'] == 200
-        assert resp['body'] == '/app /rest/user/'
+        def check_prefix(url, body):
+            resp = self.get(url=url)
+            assert resp['status'] == 200
+            assert resp['body'] == body
 
-        resp = self.get(url='/catchall')
-        assert resp['status'] == 200
-        assert resp['body'] == 'No Script Name /catchall'
-
-        resp = self.get(url='/apis')
-        assert resp['status'] == 200
-        assert resp['body'] == 'No Script Name /apis'
-
-        resp = self.get(url='/api/users/')
-        assert resp['status'] == 200
-        assert resp['body'] == '/api /users/'
+        check_prefix('/app', '/app ')
+        check_prefix('/app/', '/app /')
+        check_prefix('/app/rest/user/', '/app /rest/user/')
+        check_prefix('/catchall', 'No Script Name /catchall')
+        check_prefix('/api', '/api ')
+        check_prefix('/api/', '/api /')
+        check_prefix('/apis', 'No Script Name /apis')
+        check_prefix('/api/users/', '/api /users/')

andrey-zelenkov avatar Jan 20 '22 17:01 andrey-zelenkov

@andrey-zelenkov Sorry about that. All of your changes should now be applied to the code.

OutOfFocus4 avatar Jan 20 '22 19:01 OutOfFocus4

Merged recent commits into my fork to ensure no conflicts.

@VBart & @andrey-zelenkov Are there any other steps I need to take?

OutOfFocus4 avatar Feb 09 '22 14:02 OutOfFocus4

Merged recent commits into my fork to ensure no conflicts.

OutOfFocus4 avatar May 03 '22 11:05 OutOfFocus4

@tippexs @alejandro-colomar @artemkonev Can someone please look at this?

OutOfFocus4 avatar May 08 '22 11:05 OutOfFocus4

@tippexs @alejandro-colomar @artemkonev Can someone please look at this?

Yes. I'll take a look soon.

alejandro-colomar avatar May 11 '22 22:05 alejandro-colomar

@OutOfFocus4 looked at this again. We hab lately a discussion about the script name and did a lot of research lately. The SCRIPT_NAME can be set using the environment variable as well. That means you can simply set the value if needed. Will that be a fix for you in general?

Timo

tippexs avatar Jun 19 '22 11:06 tippexs

@tippexs, I'm afraid it is not a matter of "simply" setting an environment variable.

A developer can set a SCRIPT_NAME environment variable, but that variable, along with most other environment variables, will not be part of the environment dictionary passed to the application, so they must manually set the script prefix in their WSGI applications.

In addition, environment variables are set in the application object, so if a WSGI application is made available at multiple paths using targets, all of those targets will share a SCRIPT_NAME value.

Please let me know if I'm incorrect in my analysis.

OutOfFocus4 avatar Jun 20 '22 13:06 OutOfFocus4

Hi @OutOfFocus4 - Thanks for making that clear. I was checking that again and you are right. Environment variables are not included in the Python applications context.

Regarding the naming of the new variable and where to put it. So basically SCRIPT_NAME as well as some other configuration options names are not defined by Unit. We can just pick the names up and let the user configure it. Telling the user whats available and for what it can be used is something we can / should cover with our documentation or at least link to the right python / python frameworks docs.

We already have PHP Options to configure the PHP app context using an options object in the configuration syntax. Here is the PHP example.

{
    "type": "php",
    "processes": 20,
    "root": "/www/blogs/scripts/",
    "user": "www-blogs",
    "group": "www-blogs",
    "options": {
        "file": "/etc/php.ini",
        "admin": {
            "memory_limit": "256M",
            "variables_order": "EGPCS"
        },

        "user": {
            "display_errors": "0"
        }
    }
}

So, what do you think about create an options object in the target like

{
    "applications": {
        "python-app": {
            "type": "python",
            "path": "/www/apps/python-app/",
            "targets": {
                "foo": {
                    "module": "foo.wsgi",
                    "callable": "foo",
                    "options":  {
                       "script_name": "/foo/"
                     }
                },

                "bar": {
                    "module": "bar.wsgi",
                    "callable": "bar"
                }
            }
        }
    }
}

This means we would give users the oportunity to define the runtime configuration params as they want. SCRIPT_NAME is one of them but what if they want to change some others? With this we are more flexible for future changes.

Sorry for stressing this topic about naming and stuff again but given we can bring that down to a final solution we can ship this enhancement within the next releases.

tippexs avatar Jun 20 '22 23:06 tippexs

@tippexs I have no problem changing the name to script_name or script_prefix or script_alias or something else, nor do I have an issue with moving it to be inside an options object.

However, the SCRIPT_NAME is more than just a variable in the environ dict; it is used to strip path information from the request path to aid routing.

For example, when I develop a Django application, I can run it on a test server and go to http://localhost:8000/tickets/ to test the application properly routes and renders the view at /tickets/. In production, however, that application might be under the path /help-desk/, and the tickets view would therefore be located at /help-desk/tickets/.

Currently, unit will send the full path /help-desk/tickets/ to the application, which the application won't be able to properly route. With this pull request, if the prefix (or script_name or script_prefix or script_alias) is set to /help-desk, then /help-desk will be stripped from the beginning of the path, and the application will properly route the request to the view for /tickets/.

OutOfFocus4 avatar Jun 20 '22 23:06 OutOfFocus4

@OutOfFocus4 thanks for answering so quickly. I have just reviewed the python envrion documentation https://peps.python.org/pep-3333/#environ-variables

and now it is clear. The options object does NOT make sense here. Sorry for the convusion. As SCRIPT_NAME is the last missing envrion according to the spec.

@alejandro-colomar and I are fine with prefix. We will share our findings internally tomorrow and update this right after. @alejandro-colomar will review your PR once more and let you know any concerns.

Fyi, at the moment we are working on a similiar thing with Ruby/Rails. As Ruby follows the same Spec then python it should work the exact same way. But this is out of this discussion. Timo

tippexs avatar Jun 21 '22 00:06 tippexs

@tippexs @hongzhidao @alejandro-colomar I haven’t heard anything in a while. Do I need to make any changes to this?

OutOfFocus4 avatar Sep 13 '22 11:09 OutOfFocus4

@tippexs @hongzhidao @alejandro-colomar I haven’t heard anything in a while. Do I need to make any changes to this?

Hi!

Sorry, I was doing other things and let this aside for a too long time. I left some review before I go on vacation. I'll be back next month.

Cheers, Alex

alejandro-colomar avatar Sep 13 '22 12:09 alejandro-colomar

What did change in the last force fush? Due to the lot changes in the master branch, the "Compare" button doesn't show nicely the difference in this PR.

alejandro-colomar avatar Sep 13 '22 14:09 alejandro-colomar

What did change in the last force fush? Due to the lot changes in the master branch, the "Compare" button doesn't show nicely the difference in this PR.

Sorry about that. I replaced the check (prefix.length < 1 || prefix.start[0] != '/') with (!nxt_strchr_start(&prefix, '/')) like you suggested and removed more unused code I found.

My most recent push splits the unused code removal from the rest of the PR.

OutOfFocus4 avatar Sep 13 '22 14:09 OutOfFocus4

The changes look good to me. Reviewed-by: Alejandro Colomar <[email protected]>

However, I have the small concerns about reorganizing the code that I mentioned in comments (which don't affect behavior at all).

alejandro-colomar avatar Oct 14 '22 11:10 alejandro-colomar

Could you please add some information in the commit message?

You could add a condensed version of the discussion in this github page. I'm interested in why the patch is necessary, and how it is expected to be used.

Also, if you could sign the patches, it would be great. https://www.kernel.org/doc/html/latest/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin

Thanks! Alex

alejandro-colomar avatar Oct 24 '22 23:10 alejandro-colomar