dspace-angular
dspace-angular copied to clipboard
Ensure Mirador viewer sends DSpace Authorization header
References
- Fixes #1435
Description
Adds a request preprocessor function to Mirador viewer configuration. If a DSpace auth token is present, the function adds an authentication header to the request.
Instructions for Reviewers
List of changes in this PR:
- Updated Mirador
index.js
Include guidance for how to test or review your PR.
This is a minor change to viewer configuration only. To test, run yarn run build:mirador and open a restricted IIIF DSpace item as the authorized user.
Checklist
This checklist provides a reminder of what we are going to look for when reviewing your PR. You need not complete this checklist prior to creating your PR (draft PRs are always welcome). If you are unsure about an item in the checklist, don't hesitate to ask. We're here to help!
- [x] My PR is small in size (e.g. less than 1,000 lines of code, not including comments & specs/tests), or I have provided reasons as to why that's not possible.
- [x] My PR passes TSLint validation using
yarn run lint - [x] My PR doesn't introduce circular dependencies
- [ ] My PR includes TypeDoc comments for all new (or modified) public methods and classes. It also includes TypeDoc for large or complex private methods.
- [ ] My PR passes all specs/tests and includes new/updated specs or tests based on the Code Testing Guide.
- [ ] If my PR includes new, third-party dependencies (in
package.json), I've made sure their licenses align with the DSpace BSD License based on the Licensing of Contributions documentation.
Hi @tdonohue , this is an interesting question. The preprocessing function adds the auth header to requests made from Mirador to the IIIF endpoint. I tested it by placing access restrictions on the item dso, not on individual bitstreams, and that part seems to work fine. Mirador is able create and retrieve a complete IIIF manifest from the item, bundle and bitstream metadata.
If you add the restriction to bitstreams (and not the item) we could return an error code from REST when no bitstreams are available and hide Mirador viewer when there's nothing to show. This might be something we add in 7.3+. It would require a bit of work on the REST and angular side but should be easy to do.
I'm pretty certain the 403 error you are seeing happens when the Cantaloupe image server tries to read the bitstream content. Obviously the request from Cantaloupe doesn't have the authorization header. At the moment, the only solution I've considered is configuring DSpace authorization for IP-based access to bitstreams using a special group.
@tdonohue , quick question. If we send the authorization header to the image server could we then pack the token into the image server request to dspace? Not sure that's allowed. If it can be done that would solve the 403 bitstream problem...
@mspalti : Essentially, yes. If the image server request to DSpace just forwarded (or copied) the same Authorization header, then things would likely work fine. The current problem seems like the Mirador viewer sends the Authorization header along (including to the image server), but the image server ignores it and the request to the DSpace backend is therefore unauthenticated. At least that's my best guess.
There's only one catch that I can think of. The Image server would likely need to be added to rest.cors.allowed-origins: https://github.com/DSpace/DSpace/blob/main/dspace/config/modules/rest.cfg#L11 Otherwise, it's still possible the REST backend will not trust the image server. But, it seems completely reasonable to me to require the Image server be trusted for everything to work.
Great! The function in this PR is not adding the header to requests to the image server but if I remove a constraint then it will be added. Cantaloupe supports a scripted strategy that we can probably use to add the authorization header to the DSpace request. I'll experiment with that after the holiday. Thanks!
@tdonohue , this turned into a tricky problem, but I think I have some answers now.
Obviously it's easy to add the authorization header to DSpace IIIF API requests for manifests, annotationLists, etc. I think this is the important feature for folks in the digitization / cultural resources community (which will be the largest IIIF user group). For these users, I think the ability to add bitstream-level restrictions is a low-level concern but restricting access to items will be important.
For other user groups with more traditional IR needs and expectations bitstream-level access restrictions can be supported but it's a bit trickier. Here's what I've discovered so far:
- You need to use a Cantaloupe delegate script with "pre_authorize "and "httpsource_resource_info" lifecycle methods.
- The initial request from mirador needs a custom header that carries the auth token.
- In my tests, the custom header was sent only in the initial Mirador request to the server and not subsequent requests from OpenSeadragon. However, if the image server and the angular UI server share the same cookie domain then the dsAuthInfo cookie is available in subsequent requests and can be used.
- One catch is that the Cantaloupe server needs additional CORS configuration for the initial custom header to be allowed. That's not a big change. Once we verify things a bit more it would be an easy Cantaloupe issue/pr.
This all seems doable but a heavy lift in the near term. My thought is to add DSpace IIIF API authorization for 7.2 and create a follow up issue for the image-server bitstream work.
I added the preprocessing for Mirador bitstream authorization to this PR. Since it's fairly tentative we could take it out for now or we could leave it and note that it's a work-in-progress.
BTW, I actually don't understand why in my tests the dsAuthInfo cookie is not available in all requests to the image server. Or why the custom header was not added to all requests.
If there's a way to use cookies in all requests then bitstream access control just requires a bit of additional Cantaloupe configuration which could be described in documentation. If the custom header can be made to work in all cases that would remove the shared cookie domain requirement. It might be worth a closer look at how Mirador works. But at this point I don't know what we'd find.
I tested with a production site on which dspace and the Cantaloupe image server run behind a reverse proxy. As expected, the dsAuthInfo cookie was available for all requests. I was able to use the cookie to pre_authorize requests and retrieve restricted bitstreams using a delegate file.
I'm going to remove the extra Mirador preprocessor for image server requests since it now seems both unnecessary and problematic.
I added a postprocessor to provide a more meaningful error message if the manifest contains to images. It would be better if this relied on an API response code. But for now, it works to check the json response.
This pull request introduces 1 alert when merging c805969505e45004b5f3559eb0626b70ff4491bc into 710d8931874f10eef899b82fa2678413bdaf9e73 - view on LGTM.com
new alerts:
- 1 for Unused variable, import, function or class
Yes, I agree that the iiif authentication API will be necessary if we want to fully support access restriction at the bitstream level. I'm sure there are possible authentication use cases that DSpace will want to support eventually!
But I'm still sort of asking myself the philosophical question. Do we have any use cases at present that require this level of control over individual image content? To me there's a difference between use cases that arise out of IIIF community usage and the kinds of uses cases already supported in the default DSpace Item view. The IIIF integration is about the former not necessarily the latter.
4Science might be in the best position to know what advanced IIIF use cases are in demand right now. From my perspective it may be enough to just require all bitstreams added to an IIIF manifest have anonymous read access.
So as noted above I discovered another issue with restricting bitstreams. We cache manifests for better performance and currently have no way to manage the cache in response to varying user permissions. That might require some research.
I'm away for the day. Here's a quick summary of where I think we stand on the question of bitstream restrictions.
Basically, dspace bitstream and bundle restrictions do not work at all with iiif-enabled items because of our current approach to caching. If the caching issue is fixed, then it will be possible for local institutions to configure their image server to use the dspace authorization token. Institutions who do not configure their image server will need to avoid bundle and bitstream access restrictions since as @tdonohue discovered they produce viewer errors.
These statements are true only for the embedded Mirador viewer and not manifests that are shared with an external viewer. Full IIIF interoperability will require a dspace implementation of the IIIF Authentication API. That's beyond the current scope of our efforts but worth investigating.
Meanwhile, this PR does allow the embedded viewer to access Items that are restricted at the Item level. That's an important enhancement.
My personal feeling is that this PR is not yet ready to go in an official release.
Our IIIF implementation assumes that each dspace item has at most a single IIIF manifest associated to it, this allow us to cache the response ignoring who have requested it. Security is still in place as it is verified before to access the cache.
Provide access to a IIIF manifest of a restricted Item has no real value if we are not able also to grant to the viewer access to the restricted bitstreams as usually the bitstreams have stronger restriction than the item. We wan't suggest to protect the item metadata leaving the bitstream open as a good security practice.
@mspalti you say that the caching issue is currently preventing us to manage restricted bitstream I'm not sure which is your idea here. In any case I would to discourage you to include any authorization token in the manifest document itself as the manifest document is sometime shared as a json file directly between researcher, uploaded in other systems or harvested by other system as well
@mspalti and @abollini : As it sounds to me like there's still a lot to be figured out / discussed regarding this feature (especially with caching, etc), I'd recommend we simply reschedule this for 7.3 at this time. That means that 7.2 will just have the same behavior as 7.1, in that IIIF items are must have publicly accessible bitstreams
Yes, I agree that we are not ready to merge this one. Not enough time for consideration.
For the sake of further analysis:
@tdonohue , @abollini , there's no suggestion here that we include an authorization token in the manifest. The PR solves only one problem and in a way that's not intended to support IIIF interoperability. Maybe a concrete example will help make the intent clear. Say I'm an archivist and I've added a new IIIF enabled item to DSpace. I do not want it to be available to everyone (or interoperable) so I restrict access to the item. When I log in as an authorized user, I can see the DSpace item and the embedded Mirador viewer. But if the embedded viewer cannot provide the authorization token in its request to fetch the manifest, then the viewer fails and I can't see my stuff. That's frustrating. Letting Mirador add the authorization token to the request solves this problem. Only this problem and not the others we've been discussing. It also doesn't work if the manifest is used in some other system.
Actually, this is the only problem I considered when I suggested the solution. I still think it's a good one for this one (important) issue.
Bitstream restrictions and IIIF interoperability are both bigger problems as we've discovered. There are several issues with bitstream restrictions that I can see:
- Bitstreams are requested by the image server. If bitstreams are restricted, then these requests require an authorization token. This PR can't address that problem since it's not a DSpace issue. That said, in the case of Cantaloupe it's fairly easy to do if the request includes a DSpace cookie. But it's up to the local institution to get things working in a secure way. DSpace can't support it directly. Strategies for local configuration will vary with image server used.
- Caching is an issue (I think). Here's an example. An item contains 2 images. One anonymous read and one restricted. I access the item anonymously and the manifest is returned with a single image. But I want both, so I log in. The cached manifest is returned. Again with a single image. Because the manifest is cached I won't get access to the second image. The reverse could also happen, in which cases an anonymous user receives a cached manifest that contains a restricted image and sees a viewer error.
- It's currently possible to return a manifest with zero canvases if all bitstreams are restricted. I think this is something we should disallow at the API level and return an error code that can be handled gracefully in the Mirador viewer. (I haven't checked to see what the standard says.)
It would great to have a solution to the bitstream problem. But that might require something like implementing the IIIF Authorization API. Well beyond the scope of this PR.
I'm not clear how important bitstream restrictions are in the context of our IIIF integration. I'm convinced they are important! But I don't have a sense of how important or the level of priority for future work.
@mspalti : Was reminded of this work as we went through the 7.3 board in today's meeting, so I re-read your last comment here.
While I understand the goal you were trying to achieve, I think you are making an assumption that most people would place the access restrictions at the Item level rather than the Bitstream level in your described use case. If the archivist you described decided instead to create a public Item with a restricted Bitstream, then you'd see the same behaviors you describe even with this PR in place. The archivist would still be frustrated that they cannot see the Bitstream in the Mirador viewer.
So, my worry here is that we are assuming that all users will provide access restrictions only at the Item level (and only seeking to solve that smaller problem). If someone instead accidentally (or purposefully) restricts the Bitstream, they will be confused as to why the Mirador viewer suddenly doesn't work -- even if it works fine for a different restricted Item (where the restrictions are only at the Item level).
I think we all (you, @abollini and I) see that the current behavior is problematic. But, I'm hesitant to apply a fix that only works if Items are restricted, but won't work if Bitstreams are restricted. I'd rather us try and find a way to minimally get Item & Bitstream restrictions working.... and we can always follow up that work with a full implementation of IIIF Authentication API in a separate PR at a later time (if that's a much larger task)
For now, I'm going to flag this PR as a work in progress. If it helps though, I'm glad to try to find time to discuss this problem in more detail in a future meeting...or we can further brainstorm in this PR or the associated issue ticket.
Sorry I missed your last comment @tdonohue . I am totally fine with designating this a work in progress!
Based on my previous comments it seems we can handle bitstream access by passing the JWT token to the image service, and configuring the image service accordingly. The technical problem right now is that our cache system isn't able to return different versions of a manifest for users with different credentials. It's an all or nothing cache without a notion of tiered access. That can probably be remedied.
I agree that the Authentication API is a bigger problem and one that needs to be addressed eventually. A solution will depend in some ways on the issues we're discussing here.
A quick addition to this conversation.
I needed access restrictions for a collection of licensed images and was able to test the mirador and cantaloupe configuration mentioned above. It works. The cantaloupe configuration is similar to the mirador config in this PR: it uses the dsAuthInfo cookie to set the JWT before making the request to DSpace. (If cantaloupe is using a cache as would be typical, it needs to be configured to always check DSpace before returning the cached image.)
As @abollini noted, this particular cookie-based approach works only for the embedded viewer and is not consistent with the IIIF protocol for authentication. Also, it requires configuration of the image server as well as the viewer. So I'm thinking of this as a configuration recipe that one can use in lieu of full support for the IIIF Authentication API (and perhaps in combination with it when/if it becomes available in DSpace). I wouldn't recommend modifying our default Mirador configuration file index.js for all the reasons discussed earlier.
There's still the problem of Item and Bitstreams with different permissions (say a public item with a public low-res image and a restricted high-res image) because we can have only one version of the Manifest in the backend cache, but that seems less common and not the primary use case anyway.