girder icon indicating copy to clipboard operation
girder copied to clipboard

Problem with POST resource/download endpoint when HistomicsUI parameter restrict_downloads = True

Open pearcetm opened this issue 2 years ago • 11 comments

I am working on a project using the Digital Slide Archive (https://github.com/DigitalSlideArchive/digital_slide_archive) which uses Girder as a backend. I am trying to track down the root of a problem I have using the "download checked resources" functionality in the default web interface. The problem seems to be that the cookie containing girderToken isn't being accepted as authentication, because when I try this, I get the following response:

{
    "message": "You must be logged in or have a valid auth token.",
    "type": "access"
}

Using other methods in the user interface, like downloading the whole folder, work OK because they are sent to an endpoint with a Girder-Token header, but it doesn't allow me to choose only specific items to download.

I suspect the problem is in the way the cookie is being forwarded (or not). My nginx.conf for this endpoint is as follows:

    # pass /api/v1/ on to the DSA instance
    location /api/v1/ {
                        proxy_pass http://dev-dsa;
                        proxy_redirect     off;
                        proxy_set_header   Host $host;
                        proxy_set_header   X-Real-IP $remote_addr;
                        proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
                        proxy_set_header   X-Forwarded-Host $server_name;
                        proxy_set_header   Cookie $http_cookie;
                        
                        client_max_body_size 100m; # set this to match the girder application (uses cherrypy, default 100m)
                        
                }

Any advice on how to configure this properly would be much appreciated. I'm happy to attempt to provide additional information as needed.

pearcetm avatar May 25 '22 21:05 pearcetm

Here are the recommended instructions for serving Girder with Nginx as a reverse proxy:

https://girder.readthedocs.io/en/latest/deployment-alternatives.html#reverse-proxy

zachmullen avatar May 26 '22 11:05 zachmullen

Thanks for pointing me to that link. Unfortunately, even when configuring nginx and girder options as described, I am still getting the same permissions error when trying to download selected resources.

If I add the token (from the cookie) as part of the query string or into the body of the POST request, it succeeds. It is only the cookie-based authentication method that is causing the problem.

pearcetm avatar May 26 '22 13:05 pearcetm

That's unexpected, since the folder download endpoint also relies on cookie authentication; both endpoints are decorated to allow cookie auth. Can you confirm that downloading a folder as a zip is working? @manthey could something DSA-specific be interfering?

zachmullen avatar May 26 '22 13:05 zachmullen

Interesting - I guess it isn't anything to do with the cookie then. Downloading the whole folder works fine, it is only when I try to download checked items that it gives the authentication error. I see that the folder download is a GET request while the checked resources is POST. Could that be contributing somehow?

pearcetm avatar May 26 '22 13:05 pearcetm

I see that the folder download is a GET request while the checked resources is POST. Could that be contributing somehow?

It's possible. I just tried this on a different (out-of-the-box) Girder instance, and it used the POST method and sent the cookie, and worked to download checked private resources. So if I am going to reproduce it I might need more info; what browser & version are you using?

zachmullen avatar May 26 '22 14:05 zachmullen

There is a setting in the default deployment of the DSA to restrict some download endpoints. I'd have to look up the details -- but this is specified in the girder.cfg file. Can you try turning it off and seeing if it works (the offending lines are probably

[histomicsui]
# If restrict_downloads is True, only logged-in users can access download and
# tiles/images endpoints.
restrict_downloads = True

manthey avatar May 26 '22 14:05 manthey

On REST API page I see api version: 3.1.14 at the bottom of the page. Both Chrome and Safari have the same behavior. I'm using MacOS currently, but have also tried from Windows with the same permissions error.

pearcetm avatar May 26 '22 14:05 pearcetm

There is a setting in the default deployment of the DSA to restrict some download endpoints. I'd have to look up the details -- but this is specified in the girder.cfg file. Can you try turning it off and seeing if it works (the offending lines are probably

[histomicsui]
# If restrict_downloads is True, only logged-in users can access download and
# tiles/images endpoints.
restrict_downloads = True

Commenting out restrict_downloads = True allows me to download checked resources.

I don't quite understand the full implications of disabling this. It seems like limiting downloads to logged in users is a good idea but maybe it doesn't actually change behavior of the application very much, if permissions to the folders themselves are restricted to logged in users...?

pearcetm avatar May 26 '22 14:05 pearcetm

By default, if a user can look at a file, they can download it. One of our primary collaborators on DSA wanted to allow anonymous users to view images but not download them as they were concerned about egress bandwidth.

manthey avatar May 26 '22 15:05 manthey

Limiting download permissions does seem useful. It looks like the restrict_downloads flag causes the following (from histomicsui/init.py):

    if curConfig.get('restrict_downloads'):
            # Change some endpoints to require token access
            endpoints = [
                ('collection', 'GET', (':id', 'download')),
                ('file', 'GET', (':id', 'download')),
                ('file', 'GET', (':id', 'download', ':name')),
                ('folder', 'GET', (':id', 'download')),
                ('item', 'GET', (':id', 'download')),
                ('resource', 'GET', ('download', )),
                ('resource', 'POST', ('download', )),

                ('item', 'GET', (':itemId', 'tiles', 'images', ':image')),
            ]

            for resource, method, route in endpoints:
                cls = getattr(info['apiRoot'], resource)
                boundfunc = cls.getRouteHandler(method, route)
                func = getattr(boundfunc, '__func__', boundfunc)
                if func.accessLevel == 'public':
                    newfunc = access.token(func)
                    newfunc.requiredScopes = getattr(func, 'requiredScopes', None)
                    if getattr(func, 'requiredScopes', None):
                        del func.requiredScopes
                    if getattr(func, 'cookieAuth', None):
                        newfunc.cookieAuth = True
                        del func.cookieAuth
                    # Rebind new function
                    if boundfunc != func:
                        newfunc = newfunc.__get__(boundfunc.__self__, boundfunc.__class__)
                        setattr(newfunc.__self__, newfunc.__name__, newfunc)
                    cls.removeRoute(method, route)
                    cls.route(method, route, newfunc)

I still don't understand how downloading an entire folder with a GET request succeeds but downloading selected resources with a POST request fails, since both routes seem to be modified the same way here.

pearcetm avatar May 26 '22 17:05 pearcetm

Here's a thought - see comments in the code below:

    if curConfig.get('restrict_downloads'):
            # Change some endpoints to require token access
            endpoints = [
                ('collection', 'GET', (':id', 'download')),
                ('file', 'GET', (':id', 'download')),
                ('file', 'GET', (':id', 'download', ':name')),
                ('folder', 'GET', (':id', 'download')),
                ('item', 'GET', (':id', 'download')),
                ('resource', 'GET', ('download', )),
                ('resource', 'POST', ('download', )),

                ('item', 'GET', (':itemId', 'tiles', 'images', ':image')),
            ]

            for resource, method, route in endpoints:
                cls = getattr(info['apiRoot'], resource)
                boundfunc = cls.getRouteHandler(method, route)
                func = getattr(boundfunc, '__func__', boundfunc)
                if func.accessLevel == 'public':
                    newfunc = access.token(func)
                    newfunc.requiredScopes = getattr(func, 'requiredScopes', None)
                    if getattr(func, 'requiredScopes', None):
                        del func.requiredScopes
                    if getattr(func, 'cookieAuth', None):
                        newfunc.cookieAuth = True # <-------- **Correctly allows cookieAuth on resource/download GET 
                        del func.cookieAuth # <-------- **This deletes the cookieAuth setting on the original bound function; however, POST is also bound to this function, and the next time through the loop when updating POST resource/download, cookieAuth is not True anymore... 
                    # Rebind new function
                    if boundfunc != func:
                        newfunc = newfunc.__get__(boundfunc.__self__, boundfunc.__class__)
                        setattr(newfunc.__self__, newfunc.__name__, newfunc)
                    cls.removeRoute(method, route)
                    cls.route(method, route, newfunc)

I'm not totally sure I understand how this all works but here goes...

GET and POST resource/download are both originally bound to the same download function (with cookieAuth=True). Looping over the endpoints here doesn't just copy the settings, it also modifies the original (removing cookieAuth). GET resource/download correctly copies the original, unmodified function settings. But the next time through the loop when POST is being updated, cookieAuth is now undefined. Hence, the permissions error...

Does that seem right?

pearcetm avatar May 26 '22 18:05 pearcetm