datasette icon indicating copy to clipboard operation
datasette copied to clipboard

base_url getting appended twice in redirects when applying filters?

Open EternityForest opened this issue 1 year ago • 9 comments

When running under a custom base URL, the "Apply" button for filters does not seem to work. It redirects to a url starting with /baseurl/baseurl/ and the extra copy of the base URL messes things up.

I'm able to get things working with the following changes, but I'm not sure if the whole thing is just user error on my end. If not, I can make a PR.

class Urls:
    def __init__(self, ds):
        self.ds = ds

    def path(self, path, format=None, already_prefixed=False):
        if not isinstance(path, PrefixedUrlString):
            if path.startswith("/"):
                path = path[1:]
            if not already_prefixed:
                path = self.ds.setting("base_url") + path
            else:
                path = "/" + path
        if format is not None:
            path = path_with_format(path=path, format=format)
        return PrefixedUrlString(path)
async def _redirect_if_needed(datasette, request, resolved):
    # Handle ?_filter_column
    redirect_params = filters_should_redirect(request.args)
    if redirect_params:
        return _redirect(
            datasette,
            request,
            datasette.urls.path(path_with_added_args(request, redirect_params), already_prefixed=True),
            forward_querystring=False,
        )

EternityForest avatar May 15 '24 06:05 EternityForest

Yeah we're seeing this too only on specific routes. The one that's breaker for us is the CSV export URL. Potentially related to https://github.com/simonw/datasette/issues/1619

I'm gonna work on this issue next week because it's a show stopper for us using Datasette in prod.

brandonrobertz avatar Jun 29 '24 21:06 brandonrobertz

I traced the problem down to two areas in the Datasette code and am working on a fix.

The first is in datasette/url_builder.py inside the Urls.path method:

https://github.com/simonw/datasette/blob/56adfff8d2112c5890888af4da65d842ba7f6f86/datasette/url_builder.py#L11-L13

In the last line, sometimes the path here already has the base_url prepended and the path is not an instance of PrefixedUrlString so the base_url gets added twice.

The second issue is datasette/app.py inside the absolute_url method: https://github.com/simonw/datasette/blob/56adfff8d2112c5890888af4da65d842ba7f6f86/datasette/app.py#L1151

The path sometimes comes in here with a base_url prepended but without a leading slash and the request.url also has the base_url at the end of the URL so this results in a double prepend of the base_url.

I'm currently adding some tests, but it's very unclear what the intended functionality was with these methods. So far I have test cases on the datasette.urls.path method like so:

@pytest.mark.parametrize(
    "base_url,path,format,expected",
    [
        ("/", "/", None, "/"),
        (None, "/", None, "/"),
        (None, "", None, ""),
        ("/prefix/", "/prefix/table/1", None, "/prefix/table/1"),
        ("/prefix/", "prefix/table/1", None, "prefix/table/1"),
        ("/prefix/", "/table/1", None, "/prefix/table/1"),
        ("/prefix/", "/table/1", "json", "/prefix/table/1.json"),
        ("/prefix/", "-/static/app.css", None, "/prefix/-/static/app.css"),
        (None, "table/1", None, "table/1"),
        (None, "table/1", "csv", "table/1.csv"),
        (None, "-/static/app.css", None, "-/static/app.css"),
    ],
)
def test_url_path_construction(ds, base_url, path, format, expected):
    ds._settings["base_url"] = base_url
    actual = ds.urls.path(path, format=format)
    assert actual == expected
    assert isinstance(actual, PrefixedUrlString)

Working on profiling the app to see how it seems to be expecting to handle URLs and will add more tests onto path as well as absolute_url.

brandonrobertz avatar Jul 02 '24 18:07 brandonrobertz

Tracking code here: https://github.com/brandonrobertz/datasette/tree/fix-repeated-base_url

brandonrobertz avatar Jul 02 '24 20:07 brandonrobertz

Is there any change this one get merged? Very annoying bug :(

nvkv avatar Oct 24 '24 22:10 nvkv

Is there any change this one get merged? Very annoying bug :(

Not merged but if you want to test this branch LMK how it goes (https://github.com/brandonrobertz/datasette/tree/fix-repeated-base_url). We were using this in prod for a while (before we changed how we use datasette) and didn't run into any problems.

pip install git+https://github.com/brandonrobertz/datasette@fix-repeated-base_url

brandonrobertz avatar Oct 26 '24 16:10 brandonrobertz

Just to note, I'm getting bit by this as well -- Kubernetes deployment behind an Nginx-based ingress controller, so notionally similar to being behind a reverse-proxy.

Older likely-related issues, just to reference them: #1619 , #2105

mgsimpson-i2 avatar Feb 03 '25 20:02 mgsimpson-i2

For now, I am using ASGI routing as a workaround, just stripping off the extra uri path component if it's doubled, and it seems to be acceptable for now even though it's hacktastic

EternityForest avatar Feb 04 '25 09:02 EternityForest

In case anyone else is in the same boat as me -- was able to remediate this (not actually fix, but make bearable) by adding a second Ingress configuration to my Kubernetes deployment that used the Nginx-based ingress controller to do some internal URL rewriting that detects the doubled base_url and un-doubles it. The YAML that I used for the Ingress looks like the following, where in my specific case base_url is set to /v1/data:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/use-regex: "true"
    nginx.ingress.kubernetes.io/rewrite-target: "/v1/data$1"
spec:
  ingressClassName: public-web         # this is specific to my deployment, YMMV
  tls:
    - hosts:
        - my.hostname.com                      # obfuscated, YMMV
      secretName: my-secret                 # obfuscated, YMMV
  rules:
  - host: my.hostname.com
    http:
      paths:
      - pathType: ImplementationSpecific
        path: "/v1/data/v1/data(/.*)"
        backend:
          service:
            name: my-service                    # obfuscated, YMMV
            port:
              number: 1234                         # obfuscated, YMMV

Hopefully that's enough to get you started if you're trying to solve a similar problem. The important parts are the annotations that turn on URL rewriting and specify the rewrite rule, plus the pathType and path settings that catch and capture the doubled path segments in the incoming URL.

A couple of links that were useful while I was working on the fix:

  • https://serverfault.com/questions/1028607/kubernetes-nginx-ingress-how-to-redirect-foo-example-org-to-example-org
  • https://kubernetes.github.io/ingress-nginx/examples/rewrite/

mgsimpson-i2 avatar Feb 04 '25 22:02 mgsimpson-i2

I'm going to see what needs to happen to get this into PR shape (https://github.com/brandonrobertz/datasette/tree/fix-repeated-base_url). If anyone used my branch LMK!

brandonrobertz avatar Feb 05 '25 17:02 brandonrobertz