[BUG] - Custom Keycloak's groups not populated into Shared Volume level
Describe the bug
Since Nebari 2024.3.3 version, it looks like when creating custom groups on Keycloak, those new groups are not showing up in the user jupyterlab's pod after re-spawning the instance. This looks like was caused by upgrading the Group fetching logic after https://github.com/nebari-dev/nebari/pull/2447 was integrated.
Expected behavior
At least until the new permission model update is in place, we expect that new groups created on Keycloak will automatically generate a subfolder in the share dir of the user's home.
OS and architecture in which you are running Nebari
Linux
How to Reproduce the problem?
- Sping up the latest release o Nebari, or any version (
>= 2024.3.3) - Create a custom group on Keycloak and assign a user to it;
- Logged as such user, spin up a jupyterlab instance
Command output
No response
Versions and dependencies used.
No response
Compute environment
None
Integrations
Keycloak
Anything else?
This is mostly likely a parsing error with our current base_profile_shared_mounts function
https://github.com/nebari-dev/nebari/blob/43fb770c822dab7c4b27d507f756aee971cd8983/src/_nebari/stages/kubernetes_services/template/modules/kubernetes/services/jupyterhub/files/jupyterhub/03-profiles.py#L79
Before I discuss further on my findings, just an overview on what is happening under the hood, at least on the components related to this.
After updating to the latest Oauth and the changes introduced in 2447 for Keycloak's group fetching, we started getting issues with the group EFS shared subfolders, as they don't appear in the user's jupyterlab after the instance is launched when the user opts for a custom Keycloak Group.
After checking our source code, I found that base_profile_shared_mounts, as referenced bellow:
https://github.com/nebari-dev/nebari/blob/43fb770c822dab7c4b27d507f756aee971cd8983/src/_nebari/stages/kubernetes_services/template/modules/kubernetes/services/jupyterhub/files/jupyterhub/03-profiles.py#L100-L133
Expect the groups object to be filtered out from the OAuath class user_oauth object, here:
https://github.com/nebari-dev/nebari/blob/43fb770c822dab7c4b27d507f756aee971cd8983/src/_nebari/stages/kubernetes_services/template/modules/kubernetes/services/jupyterhub/files/jupyterhub/03-profiles.py#L549-L552
Where such an object is generated by the Oauth class, the generic authenticator class here (upstream):
async def update_auth_model(self, auth_model):
"""
Sets admin status to True or False if `admin_groups` is configured and
the user isn't part of `admin_users` or `admin_groups`. Note that
leaving it at None makes users able to retain an admin status while
setting it to False makes it be revoked.
Also populates groups if `manage_groups` is set.
"""
if self.manage_groups or self.admin_groups:
user_info = auth_model["auth_state"][self.user_auth_state_key]
user_groups = self.get_user_groups(user_info)
if self.manage_groups:
auth_model["groups"] = sorted(user_groups)
if auth_model["admin"]:
# auth_model["admin"] being True means the user was in admin_users
return auth_model
if self.admin_groups:
# admin status should in this case be True or False, not None
auth_model["admin"] = bool(user_groups & self.admin_groups)
return auth_model
Later customized by our KeycloakOauthenticator class, in here: https://github.com/nebari-dev/nebari/blob/43fb770c822dab7c4b27d507f756aee971cd8983/src/_nebari/stages/kubernetes_services/template/modules/kubernetes/services/jupyterhub/files/jupyterhub/04-auth.py#L11
I assumed initially this was an issue with the authenticator upstream code, as we were passing the required settings but the groups don't seemed to receive the correct values, which I was able to attest were being sent by Keycloak to the Oauth (you can do so by inspecting the client user response body in the Admin realm):
After talking with @krassowski, I am sure the issue is within our code and likely in the base mounts function outlined above (as the contents of groups could be different from before) or in the custom Keycloak Oauth class logic.
I am currently testing out the response object sent to both the Keycloak class and the base mount function to determine which part of the codebase is incorrect.
BTW, the shared directory randomly showed up me for after a while in jupyter. I had previously ssh'd into the nfs pod and manually created the directory so I'm not sure if that affected it. I'm not sure why it took so long though. Maybe a redeployment of Nebari was necessary, not sure.
I re-attempted this yesterday from develop, but the error didn't show up (locally) this time. I will test again using GCP (the same as when I first found this) to make sure it's there or not.
I am trying to think of a reason why this appeared. I inspected the Kecloak Authenticator and the objects there seemed okay based on what was available.
One thing that was different in my last attempt, was tat I directly deployed the utmost most version, whereas in the previous attempt this w,as an upgrade from an older version to a more recent release. It could've been a silent issue somewhere within the EFS that I didn't catch.
The problem was later found out to be an issue with the keycloak authstate data sent, a small difference when loading it as name or path when updating the oauth_user specs under the auth_State object.