[ENH] - Expose Backup and Restore for Keycloak API
Feature description
Backup and Restore RFD
Value and/or benefit
Nebari Admin can access Keycloak API and query user data.
We need to create a backup controller to expose the existing Keycloak API to authenticated users. This can be a REST API using FastAPI.
/api/v1/keycloak: Expose existing Keycload Admin REST API - docs
So for example:
GET /api/v1/keycloak/admin/realms/nebari/users should get us a list of all the non-admin users.
Anything else? Related META issue- #2518
Proposed Design:
Nebari uses Keycloak to manage users and groups.
Keycloak also manages clients for services like Jupyterhub, argo-server-sso, conda_store, etc. Nebari creates These clients automatically, so we don't have to expose them. Nebari also pre-created roles, so we can skip exposing them.
The Nebari API will consolidate responses from the internal Keycloak API and return a consolidated JSON response. This JSON representation should contain enough information to restore the same user in another instance of Keycloak. Assumptions:
- The Keycloak on both sides of backup and restore are managed by Nebari and thus expected to have the same default setup with Nebari installation.
- The endpoints provide a mechanism to retrieve and restore. The logical sequence of how endpoints will be called is left up to the client application code.
- The lists of internal Keycloak endpoints (below) were created using Keycloak REST API docs. While developing this feature, we might have to add/delete a few endpoints.
- If a user dependency is not present while restoring a user, for example, its groups, then those groups should be automatically created while creating a user.
With this context, to accurately re-create the Nebari Keycloak, we need the following endpoints:
Endpoints for backup
GET /users (Get all users)
sequenceDiagram
API Client ->> Nebari API: GET /users
Nebari API-->>Keycload API: GET /admin/realms/{realm}/users
Keycload API-->>Nebari API:
Nebari API-->>Keycload API: GET /admin/realms/{realm}/users/profile
Keycload API-->>Nebari API:
Nebari API-->>Keycload API: GET /admin/realms/{realm}/users/{id}
Keycload API-->>Nebari API:
Nebari API-->>Keycload API: GET /admin/realms/{realm}/users/{id}/groups
Keycload API-->>Nebari API:
Nebari API-->>Keycload API: GET /admin/realms/{realm}/users/{id}/role-mappings
Keycload API-->>Nebari API:
Nebari API ->> API Client: [{< composit json user representation >}, {...}, ...]
GET /users/{id} (Get user details)
sequenceDiagram
API Client ->> Nebari API: GET /users/{id}
Nebari API-->>Keycload API: GET /admin/realms/{realm}/users/{id}
Keycload API-->>Nebari API:
Nebari API-->>Keycload API: GET /admin/realms/{realm}/users/{id}/groups
Keycload API-->>Nebari API:
Nebari API-->>Keycload API: GET /admin/realms/{realm}/users/{id}/role-mappings
Keycload API-->>Nebari API:
Nebari API ->> API Client: {< composit json user representation >}
Endpoints for restore
POST /users/ (Create a new user.)
sequenceDiagram
API Client ->> Nebari API: POST /users/
Nebari API-->>Keycload API: POST /admin/realms/{realm}/users
Keycload API-->>Nebari API:
Nebari API-->>Keycload API: PUT /admin/realms/{realm}/users/{id}/reset-password
Keycload API-->>Nebari API:
Nebari API-->>Keycload API: PUT /admin/realms/{realm}/users/{id}/groups/{groupId}
Keycload API-->>Nebari API:
Nebari API-->>Keycload API: GET /admin/realms/{realm}/users/{id}/role-mappings
Keycload API-->>Nebari API:
Nebari API ->> API Client: {< composit json user representation >}
DELETE /users/{id} (Get user details)
sequenceDiagram
API Client ->> Nebari API: DELETE /users/{id}
Nebari API-->>Keycload API: DELETE /admin/realms/{realm}/users/{id}
Keycload API-->>Nebari API:
Nebari API ->> API Client: {}
That's a nice description of all the concepts thanks, @pt247, Have you started working on this yet? I am considering scheduling a meeting for this as well
Thanks @viniciusdc
Have you started working on this yet? I am considering scheduling a meeting for this as well
No, I have not started. Meeting only foucsing on Keycloak backup and restore would help.
interesting link: https://github.com/nebari-dev/nebari/compare/develop...viniciusdc:nebari:1784-keycloak-add-multiple-users
@tylergraff I assume you have some experience in backup and restore. I was wondering how you migrate credentials? Also, can you scripts any scripts you use to automate/simi-automate Keycloak migration.
@kalpanachinnappan We are starting work on backup and restore. Keycloak is the first component we are considering. We would like your input on this as well.
As this solution may rely on the Keycloak API, @viniciusdc suggested we might want to compare the current Keycloak API with the latest version of Keycloak.
yes, mainly due to more flexibility of the exporting/importing features that the newer Keycloak versions allow (it might help us handle the credentials issue)
Just as a refresher here, to recap what has been discussed so far, and use this as a foundation for the overall architecture and development strategies for other services later on:
Goals of the Backup and Restore
- An atomic structure to support individual exporting mechanisms, such as external backup-restore tools (provided by the user): Control over User/Group backed-up payload data;
- This will help users remove corrupted data or harmful resources in the service state.
- Restoration of service (Keycloak in this case) to a specified state: e.g., what main components are required for Keycloak to return to its previously presented state?
- As an addendum, as we are attempting to rely solely on Keycloak's API and to minimize data conflicts as much as possible, the state here will refer to the closest set of configurations that generate the same level of service control as its previous installation retained. For example, it will be impossible to maintain the exact same level of metadata, but the overall structure of groups, users, and authenticators will be replicated and re-instated as new data in the new version of the service (be it a new cluster or other scenarios).
- Docs to provide guidance in case of a complete manual procedure (suppose the restore mechanism is at fault).
Components
The major endpoints have been outlined above in previous comments, but the general overview of the data we will be looking for so far will be:
- Users: A JSON list comprising all data/metadata of the user, such as user ID, email, etc., the same present in the user settings form in the Keycloak UI. Additionally, we will include a field for credentials, if such information can be extracted, otherwise, we will initiate a reset password flow.
- Groups: A JSON list comprising all data/metadata of the group, such as group name, ID, roles, etc. Additionally, we will include in this model the users who are bonded to a given group.
- Authenticators and clients: Besides the default ones provided by Nebari, users might have set up a few other clients and extra configurations that are likely desired to ensure all ecosystems work in a restore migration.
This will be made customizable so that this list can be extended, in a modular way.
Overall structure
graph TD;
subgraph Keycloak
EM([Export Mechanism])
IM([Import Mechanism])
end
BnR[Backup & Restore]
B([Backup])
R([Restore])
BnR --> B
BnR --> R
B --> EM
R --> IM
For the implementation details:
- Initially, we will have the backup/restore service running as a standalone pod, backed by a fast API, though there are some considerations of building it as a Go application, to help with shipping and running it outside of the Kubernetes context.
- Both Backup and Restore here are just two independent sets of routes that the Backup & Restore (BARe) service will have to possibly handle some of the payloads (auth with Keycloak, process users/groups, validate merge conflicts when restoring, etc.).
- The Import Export mechanisms is just a wording for generalizing to all services, but it will be our payloads or a gimmick to allow generating serializable data that both services can consume in a bi-directional way.
- For the purpose of Keycloak, it will be the aggregation of the endpoints of each component data and possibly some parsing.
Now some details that might be interesting to have in the perspective of the BARe service:
- Compress all backed-up data into tar files in a block storage of your choice, can default to a PV.
- Restore can be used to revert harmful changes made to the current cluster, so that the service can also be used within the current cluster not only when migrating/upgrading -- e.g., corrupted state of conda-store, Keycloak.
Action items
- [x] Validate JSON payloads for each keycloak endpoint:
- Make sure there's enough data to create a copy of the Keycloak service "state" from one cluster to the other. This will also help determine which models/responses will need parsing and which will not.
- [x] The service's POC is a very simple script that runs the endpoints and showcases the restoring and backup strategy outlined above.
- [x] Develop a more robust POC considering the goal of exposing the API enough so that users can develop their own BAre tools;
Quick updates:
Validate JSON payloads for each keycloak endpoint:
- User credentials might not be included in the user payload objects. This is due to how Keycloak's password security policies work (single direction). To work around this, we have two options:
- Create each user and send a reset password request so that the re-authenticate (set a new password) workflow automatically triggers upon user login.
- PROS: Maintain security directrices alongside keycloak, and limit the level of exposure of user's data
- CONS: Require users to pass through a re-authentication workflow, which could be a hassle in scenarios where user hierarchy structure might differ, e.g., Project clients, external users... etc.
- Or directly connect to Keycloak's PostgreSQL DB and retrieve the credentials (a code snippet for that can be seen a follow-up comment below)
- Create each user and send a reset password request so that the re-authenticate (set a new password) workflow automatically triggers upon user login.
- IDPs will may not be backed up (unless we follow the DB retrieval approach bellow). The main detrimental factor is that the IDP secret to finish the setup can't be requested from the admin APIs (same reasoning as above). Also, while not as problematic, in cases where the DNS of the target cluster changes, the config will be incoherent, so manual intervention would still be required. Though they can* be exported from the admin console UI, they can still be documented.
- A caveat of this decision, yet to be thoroughly tested, concerns users whose login is managed by an IDP, in which case syncing could be compromised during password reset.
Action items from this:
- [x] test whether IDP-managed users can have their passwords reset during migration
- They can and should be set to a temp password or an email is sent to SMTP provider
- [x] Test whether existing users (previously managed by an IDP) can be added without such an IDP and whether they will automatically sync when the IDP is finally set up.
- This is part of the reconciliation workflow, where users who depend on any IDP not present in the realm at the time of the migration will be set as deactivated and configured to be automatically merged within their IDP counterpart
import json
from functools import cached_property
import psycopg
from psycopg.rows import dict_row
class KeycloakPasswordFetcher:
@cached_property
def pg_connection(self):
"""Establish a direct connection to the Keycloak PostgreSQL database."""
return psycopg.connect(
f"dbname={KEYCLOAK_DATABASE_NAME} "
f"user={KEYCLOAK_DATABASE_USER} "
f"password={KEYCLOAK_DATABASE_PASSWORD} "
f"host={KEYCLOAK_DATABASE_HOST} "
f"port={KEYCLOAK_DATABASE_PORT}",
row_factory=dict_row,
)
def get_keycloak_password(self, user_id):
"""Retrieve secret data from the Keycloak PostgreSQL database (not accessible via REST API)."""
query = """
SELECT secret_data, credential_data, created_date
FROM credential
WHERE user_id=%(user_id)s
ORDER BY created_date DESC;
"""
with self.pg_connection.cursor() as cur:
cur.execute(query, {"user_id": user_id})
row = cur.fetchone()
if row:
secret_data = json.loads(row["secret_data"])
credential_data = json.loads(row["credential_data"])
return {
"algorithm": credential_data["algorithm"],
"iterations": credential_data["hashIterations"],
"salt": secret_data["salt"],
"hash": secret_data["value"],
}
raise ValueError(f"No credential found for user_id {user_id}")
fetcher = KeycloakPasswordFetcher()
user_id = "<UUID>"
password = "{algorithm}${iterations}${salt}${hash}".format(
**fetcher.get_keycloak_password(user_id=user_id),
)
Based on what have been discussed above this is the general layout of how the keycloak import/export connectors will look like:
import os
import json
import requests
import requests_cache
from typing import List, Dict, Any
from pydantic import BaseModel, Field, Optional
class RoleSchema(BaseModel):
id: Optional[str] = None
name: str
description: Optional[str] = None
composite: bool = False
clientRole: bool = Field(False, alias="clientRole")
containerId: Optional[str] = Field(None, alias="containerId")
class Config:
populate_by_name = True
class KeycloakSkeleton(BaseModel):
roles: List[RoleSchema] = Field(default_factory=list, depends_on=["clients"])
class KeycloakAuth(BaseModel):
url: str = Field(os.getenv("KEYCLOAK_URL"), alias="auth_url")
realm: str = Field(os.getenv("KEYCLOAK_REALM"), alias="realm")
client_id: str = Field(os.getenv("KEYCLOAK_CLIENT_ID"), alias="client_id")
client_secret: str = Field(os.getenv("KEYCLOAK_CLIENT_SECRET"), alias="client_secret")
class Config:
populate_name = True
class KeycloakAPIClient:
def __init__(self, auth: Dict[str, str]):
self.auth = self._validate_auth(auth)
self.token = None
def _validate_auth(self, auth) -> None:
required_keys = ["url", "realm", "client_id", "client_secret"]
missing_keys = [key for key in required_keys if key not in auth]
if missing_keys:
raise ValueError(f"Missing required authentication parameters: {missing_keys}")
return auth
def _authenticate(self) -> None:
if self.token and self._is_token_valid():
return
response = requests.post(
url=f"{self.auth['url']}/realms/{self.auth['realm']}/protocol/openid-connect/token",
data={
"client_id": self.auth["client_id"],
"client_secret": self.auth["client_secret"],
"grant_type": "client_credentials",
},
headers={"Content-Type": "application/x-www-form-urlencoded"},
)
response.raise_for_status()
self.token = response.json()["access_token"]
def _is_token_valid(self) -> bool:
introspection_response = requests.post(
url=f"{self.auth['url']}/realms/{self.auth['realm']}/protocol/openid-connect/token/introspect",
data={
"client_id": self.auth["client_id"],
"client_secret": self.auth["client_secret"],
"token": self.token,
},
headers={"Content-Type": "application/x-www-form-urlencoded"},
)
introspection_response.raise_for_status()
return introspection_response.json().get("active", False)
def get(self, endpoint: str) -> List[Dict[str, Any]]:
self._authenticate()
response = requests.get(
url=f"{self.auth['url']}{endpoint}",
headers={"Authorization": f"Bearer {self.token}"},
)
response.raise_for_status()
return response.json()
def post(self, endpoint: str, json: Dict[str, Any]) -> None:
self._authenticate()
response = requests.post(
url=f"{self.auth['url']}{endpoint}",
json=json,
headers={"Authorization": f"Bearer {self.token}"},
)
response.raise_for_status()
class KeycloakExport:
def __init__(self, api_client: KeycloakAPIClient, state: KeycloakSkeleton):
self.api_client = api_client
self.state = state
def _export_roles(self) -> List:
print("Exporting role data from Keycloak...")
data = self.api_client.get(f"/auth/admin/realms/{self.state.realm}/roles")
roles = [RoleSchema(**item).dict() for item in data]
self._save_to_file(roles)
return roles
def _save_to_file(self, roles: List[Dict[str, Any]]) -> None:
os.makedirs('keycloak', exist_ok=True)
with open('keycloak/roles.json', 'w') as file:
json.dump(roles, file, indent=4)
print("Roles have been saved to 'keycloak/roles.json'.")
class KeycloakImport:
def __init__(self, api_client: KeycloakAPIClient, state: KeycloakSkeleton):
self.api_client = api_client
self.state = state
def _import_roles(self) -> None:
print("Importing role data into Keycloak...")
roles = self._load_from_file()
roles_schema = [RoleSchema(**item) for item in roles]
for role in roles_schema:
self.api_client.post(
f"/auth/admin/realms/{self.state.realm}/roles", json=role.dict()
)
print("Roles have been imported successfully.")
def _load_from_file(self) -> List[Dict[str, Any]]:
with open('keycloak/roles.json', 'r') as file:
roles = json.load(file)
return roles
class Keycloak:
"""
Main service class for interacting with Keycloak's API, exposing data export and import.
"""
state = KeycloakSkeleton()
def __init__(self, auth: Dict[str, str] = {}):
self.api_client = KeycloakAPIClient(KeycloakAuth(**auth).dict())
self.keycloak_export = KeycloakExport(self.api_client, self.state)
self.keycloak_import = KeycloakImport(self.api_client, self.state)
# Enable caching for the requests library
requests_cache.install_cache("keycloak_cache", expire_after=300)
# Dynamically expose methods from KeycloakExport and KeycloakImport
self._expose_methods(self.keycloak_export, "export")
self._expose_methods(self.keycloak_import, "import")
def _expose_methods(self, obj, prefix):
for method_name in dir(obj):
if callable(getattr(obj, method_name)) and not method_name.startswith("__"):
setattr(self, f"{prefix}_{method_name}", getattr(obj, method_name))
# Example usage
if __name__ == "__main__":
auth_params = {
"url": "http://localhost:8080",
"realm": "myrealm",
"client_id": "myclient",
"client_secret": "mysecret",
}
keycloak = Keycloak(auth=auth_params)
# Export roles
exported_roles = keycloak.export_roles()
print("Exported Roles:", exported_roles)
# Import roles
keycloak.import_roles()
print("Roles Imported Successfully")
This is a miniature version of the general structure so that it can be executed and tested; the complete code has all the required resources (Users, Roles, ...) and its reconciliation method to handle duplicated entries and the inner dependencies between the users/groups data.
The main structure for each serializable service follows the same standard as below. In this example, we outlined the backup path, though restoration will rely on the same objects:
More information about the general responsibilities of the components outlined in the diagram above can be found in the #2650
For future ref. While we talked about each route and their endpoint at the beginning of this issue, a quick summary to outline what the "Export Mechanism" and "Import Mechanism" actually do is required:
The "Export Mechanism" ("Import Mechanism") is a gateway for the targeted components' endpoints available in keycloaks that require exporting (importing). Meanwhile, storage is managed through a higher process of the backup-restore application.
Keycloak export sequence
Click to expand
sequenceDiagram
participant KeycloakServer
participant ExportProcess
participant ExportData
par Export Roles
ExportProcess->>KeycloakServer: Get Roles (ROLE)
KeycloakServer-->>ExportProcess: Roles Data
ExportProcess->>ExportData: Store Roles Data
and Export Groups
ExportProcess->>KeycloakServer: Get Groups (GROUP)
KeycloakServer-->>ExportProcess: Groups Data
ExportProcess->>ExportData: Store Groups Data
and Export Providers
ExportProcess->>KeycloakServer: Get IDPS (PROVIDERS)
KeycloakServer-->>ExportProcess: Providers Data
ExportProcess->>ExportData: Store Providers Data
and Export Users
ExportProcess->>KeycloakServer: Get Users (USER)
KeycloakServer-->>ExportProcess: Users Data
ExportProcess->>ExportData: Store Users Data
and Export Clients
ExportProcess->>KeycloakServer: Get Clients (CLIENT)
KeycloakServer-->>ExportProcess: Clients Data
ExportProcess->>ExportData: Store Clients Data
end
Keycloak import sequence
Click to expand
sequenceDiagram
participant StoredData
participant ImportProcess
participant KeycloakServer
StoredData->>ImportProcess: Start Import Process
ImportProcess->>KeycloakServer: Create/Update Groups (GROUP)
KeycloakServer-->>ImportProcess: Response
ImportProcess->>KeycloakServer: Create/Update Users (USER)
KeycloakServer-->>ImportProcess: Response
ImportProcess->>KeycloakServer: Assign Users to Groups (USER_GROUP)
KeycloakServer-->>ImportProcess: Response
ImportProcess->>KeycloakServer: Create/Update Clients (CLIENT)
KeycloakServer-->>ImportProcess: Response
ImportProcess->>KeycloakServer: Assign Roles to Clients (CLIENT_ROLE)
KeycloakServer-->>ImportProcess: Response
ImportProcess->>KeycloakServer: Get Roles for Clients
KeycloakServer-->>ImportProcess: Response
ImportProcess->>KeycloakServer: Create Missing Roles (ROLE)
KeycloakServer-->>ImportProcess: Response
ImportProcess->>KeycloakServer: Configure Identity Providers (IDENTITY_PROVIDER)
KeycloakServer-->>ImportProcess: Response
This was moved to the implementation phase and now works as expected. #2657 will handle the required modifications to make it available on nebari