rtc icon indicating copy to clipboard operation
rtc copied to clipboard

Authorization/Permissions

Open saulshanabrook opened this issue 4 years ago • 12 comments

The simplest implementation of a shared data model assumes everyone is trusted and everyone wants to access all the same information. This works for the current JupyterLab user UX. Any permissions are pushed down to the underlying file system, through the Jupyter Server, to enforce.

User stories

However, there are a number of users stories that require some form of permissions to be built into the datastore itself.

One of these is a read only mode brought up by @vidartf recently:

I'm still thinking about the use case where you want to invite a group of people to follow-along your notebook edits in read only mode (e.g. when teaching a class/tutorial), but not being able to access any other information about my session or system.

Another is to support the WIP new authorization layer in Jupyter Server (https://github.com/jupyter/jupyter_server/pull/165) articulated by @Zsailer. This would allow for fine-grained authorization at the server level for different actions on different resources, like "Is it allowed for user X to open file Y?".

Possible Solutions

One very simple permission system would be to support a readonly mode for the RTC relay. For example, you could start it with two authorization tokens, one that would allow R/W and one for read only. If you connected with the readonly only token, you would still receive all the broadcasted transactions, but you would not be able to send new transactions.

This would cover a simple use case, possibly similar to the one Vidar was thinking about. It would not address a more nuanced authorization scheme.

saulshanabrook avatar Jun 08 '20 15:06 saulshanabrook

A brief aside—

Another server-side option that's been explored is the WebDAV protocol.

This is more in scope for something like hubshare. Users can access the same docs but never edit them simultaneously because it places a lock on a file when someone opens it. When that user closes that doc, the lock is removed and someone else can pick it up. This allows collaborators to work on documents together without collision or real-time collaboration.

Zsailer avatar Jun 08 '20 15:06 Zsailer

For RTC, the authentication + new authorization layer in Jupyter Server seems like the right server-side approach.

Note, this authorization layer can be applied to any python server extension (all subclasses of JupyterHandler can take advantage of this feature). If the patch server's websocket handlers are written in Python, you could use the authorization layer to control access to the patch server. Otherwise, we could create something similar in the underlying language+framework.

Zsailer avatar Jun 08 '20 15:06 Zsailer

Here are some other cases I would think would be a somewhat common to want to do:

Case 1:

  • I'm holding a tutorial for 20 users on my server (where it is a hassle to set up new users and permission them correctly)
  • I create a new RTC session for a notebook
  • I want to give anonymous/world read access to that specific session so all participants can follow along.
  • I want to give my co-presenter Alice full access to that session

Case 2:

  • I'm working at a university, and I'm running Jupyter on a campus-wide deployment.
  • For a class I want all the students registered for that class (I have a list of user names) to have read access to one or more sessions for that class (but not any other sessions I have or create).
  • I want to be able to grant write access for specific students at certain times during the presentation (and subsequently revoke it), e.g. so they can demonstrate a solution to a tutorial problem.

Case 3:

  • I'm working in a health company where different users have different data permissions.
  • I want to share a code-editing session with a colleague.
  • My co-worker should not be able to execute code on my kernel, nor should they be able to see outputs produced from executions on it.
  • Bonus: My co-worker should also be able to execute code on their own separate kernel, without me having access to that or the outputs produced from it.

I'm not saying we need to support all of points of these cases out of the box, but they can be useful to think about so that we don't pick solutions that would prevent us from supporting such cases in the future. Or if they are deemed incompatible, then we should at least be able to make an informed choice when we deem certain cases unsupported.

vidartf avatar Jun 09 '20 12:06 vidartf

Thank you Vidar for those, that's helpful!

saulshanabrook avatar Jun 09 '20 13:06 saulshanabrook

Authentication and authorization are messy for two reasons

  • you have to interact with an external authentication system and you often have to interact with several systems
  • a lot of the authorizations are fine-grained

For the authentication/authorization mechanism, this really has to be some how "layered" onto the system, and the system needs to be able to deal with any external mechanism

However, the thing that needs to be figured out is a model for the exact types of permissions that you want. For example, in a filesystem, the permissions are read, write, execute. It turns out that for collaborative systems that the capabilities model is a lot more complicated. For example, you might want to allow a student to send an assignment but not to view other assignments.

Moodle does a good job defining roles.

What I'm doing for bitstation is that all of the user authentication is done via a proxy server. The jupyter process uses basic authentication and just knows the user, but it doesn't know how the proxy server knows that this is the user.

In addition to the uses cases, I have one special use case. I'm using jupyter as part of a corporate intranet so that what I have is one single sign in that gives the user access to jupyter and several other services. Jupyter doesn't know about the other systems that the user has access to, but the user experience is that they are looking at jupyter as one part of a general set of services.

One particular issue is that I'm a small company so that I have a lot of guests/employees/vendors with various levels of access. This makes the situation different from a large company where you get a IT permission from a centralized IT system.

Something else that works well for me is to have the same software run on docker with different users. For example, my kids have used jupyter and I have them run on a separate virtual machine than the corporate stuff.

One thing that I'd like to be more familiar with is the academic work done on authentication models. Most of the models I've seen assume a large corporate/IT system and fails when you have ad-hoc people.

One final thing is that Google, Facebook, and Microsoft have become authentication services, and they have very convenient authentication system. The issue is that you run into privacy issues and government regulatory requirements.

joequant avatar Jun 10 '20 23:06 joequant

@joequant Thanks for sharing your experience here!

For example, in a filesystem, the permissions are read, write, execute. It turns out that for collaborative systems that the capabilities model is a lot more complicated. For example, you might want to allow a student to send an assignment but not to view other assignments.

That's right. I think it's Jupyter Server's job (and possible JupyterLab for UI) to provide fine-grained permissions layer (i.e. read, write, execute permissions on each resource) to Jupyter's services. Then, anyone can add a layer above Jupyter Server's authorization layer that provides roles/groups that configure these permissions in some specific way.

JupyterHub is our provided solution for handling authentication that can interact with various authentication systems, and JupyterHub will likely remain our solution for authentication. Of course, anyone can swap out JupyerHub for their own authentication proxy server (sounds like this is what you're doing).

Jupyterhub does not yet handle authorization, because i) all servers in JupyterHub are single-user servers so it wasn't needed and ii) Jupyter Server didn't provide the plumbing for fine-grained permissions.

With RTC around the corner, JupyterHub might move towards shared servers, and shared servers quickly raise the need for authorization. I think we'll see JupyterHub provide ways to pull+leverage groups/roles definitions from auth providers in the future.

Zsailer avatar Jun 11 '20 17:06 Zsailer

Sounds to me like we first need to address Authentication before going to Authorization. I was used to tell that a correct security model must support AAA (Authentication, Authorization, Auditing).

echarles avatar Jul 12 '20 20:07 echarles

The way that it looks, it seems to me that authentication and auditing are best done in other parts of the jupyter infrastructure, and that the key thing is authorization. One model for something that does authorization well in my opinion is Android smart phones, when they have a list of sensitive services, and then a way of asking the user "do you want to give app X permission to access your camera."

The thing about jupyter is that it has hooks for authentication and auditing, but there's no infrastructure to register services and limit access.

There are also UI/UX issues. One thing about Android phones is that the recent versions do a very good job of controlling access and asking the right questions, whereas I remember that old phones had awful UI/UX. One thing that I've seen is that applications with terrible UI/UX for authentication results in people just leaving the barn door open.

On Mon, Jul 13, 2020 at 4:41 AM Eric Charles [email protected] wrote:

Sounds to me like we first need to address Authentication before going to Authentication. I was used to tell that a correct security model must support AAA (Authentication, Authorization, Auditing).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jupyterlab/rtc/issues/28#issuecomment-657272463, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWGAGKEHO7RBIYBTNNN7UTR3INYHANCNFSM4NYQAGUA .

joequant avatar Jul 13 '20 09:07 joequant

Sounds to me like we first need to address Authentication before going to Authentication. I was used to tell that a correct security model must support AAA (Authentication, Authorization, Auditing).

@echarles yep! I sketched out a very vague outline for a two level authentication model... The first level is to authenticate actual changes to the data model (editing a notebook cell). The second level is to authenticate actions caused by changes in the data model (executing a notebook cell).

The first level would have to be built in at the rtc level (below Jupyter), the second would be at the rtc-jupyter level.

saulshanabrook avatar Jul 13 '20 14:07 saulshanabrook

@saulshanabrook Yeah, I have seen that, and I like the token and signed token notions. I have various things popping to my mind such as JWT token and https://github.com/jupyter/jupyter_server/issues/50. I am trying to percolate a bit more and land something as a picture in https://github.com/jupyterlab/rtc/pull/48

echarles avatar Jul 13 '20 16:07 echarles

@echarles awesome! I am glad it at least made some sense, always better to be able to work together on this sort of thing.

saulshanabrook avatar Jul 13 '20 16:07 saulshanabrook

See also https://github.com/jupyterlab/rtc/issues/132#issue-836538548

echarles avatar Mar 22 '21 16:03 echarles