authentik
authentik copied to clipboard
root: Improve Redis config and support
Details
Resolves #5077, resolves #1285, resolves #3979, resolves #5531
External source code
Go
- Redis URL parser adapted from gitea project
- Redisstore adapted from BoxGo
Python
- Kombu cluster support adapted from https://github.com/celery/kombu/pull/1021
- Celery cluster support adapted from https://github.com/hbasria/celery-redis-cluster-backend
Changes
New Features
- Uses one URL for configuration of Redis (shared between Go and Python)
- The Python URL parser has been created in a way to resemble the Go version
- By default uses Redis for broker, channel, backend (current state), but also allows for custom options
- Moves part of the configuration currently under redis category into new dedicated sections -> split into https://github.com/goauthentik/authentik/pull/7097
- Adds deprecation check to config parser to allow for easy transition between old and new configuration scheme
- Old Redis configuration env vars are automatically converted to a Redis config URL if no Redis config URL has been configured otherwise they are ignored
- The new Redis URL allows environment variables to be included in bash style ${VAR}
Breaking Changes
- None -> Old configurations are still supported however users are advised to switch to the new URL-based configuration
Additional information
- Newly supported Redis configurations are only available using the new Redis config URL environment variable
- The Redis config URL requires special characters in parameters to be URL encoded (e.g. username, password)
- Has been successfully tested using Redis cluster, sentinel and Unix socket connections
Todo
- [x] ~~Move from
RedisChannelLayer
toRedisPubSubChannelLayer
~~ -> not possible https://github.com/redis/redis-py/issues/2219 - [ ] Add documentation for Redis URL parameters
- [ ] Add unittests for Redis middleware
Sentinel
- [x] Handle ReadOnly error when master is demoted to slave
- [x] Fix unittests
Cluster
- [x] Remove all usage of
KEYS
,MGET
andMSET
in Django and Celery - [x] Fix unittests
- [ ] Check all connections are correctly closed e.g. in redis_middleware_channels.py
Deploy Preview for authentik-storybook canceled.
Name | Link |
---|---|
Latest commit | 2d57d4e70615a3896d38f7fb2dc3cae549d32f97 |
Latest deploy log | https://app.netlify.com/sites/authentik-storybook/deploys/663d6f3edec92d000839eb36 |
Deploy Preview for authentik ready!
Name | Link |
---|---|
Latest commit | dd71ad5625c353dae88ca900ad21821a1e0353a2 |
Latest deploy log | https://app.netlify.com/sites/authentik/deploys/65c6cfbc8399db000860127e |
Deploy Preview | https://deploy-preview-7118--authentik.netlify.app |
Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
Codecov Report
Attention: Patch coverage is 86.84896%
with 202 lines
in your changes are missing coverage. Please review.
Project coverage is 92.19%. Comparing base (
046b8d5
) to head (2d57d4e
).
Additional details and impacted files
@@ Coverage Diff @@
## main #7118 +/- ##
==========================================
- Coverage 92.37% 92.19% -0.18%
==========================================
Files 704 713 +9
Lines 34401 35930 +1529
==========================================
+ Hits 31779 33127 +1348
- Misses 2622 2803 +181
Flag | Coverage Δ | |
---|---|---|
e2e | 48.79% <27.99%> (-0.85%) |
:arrow_down: |
integration | 25.45% <27.99%> (+0.08%) |
:arrow_up: |
unit | 89.71% <86.13%> (-0.12%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Several parts of this PR have been split up into https://github.com/goauthentik/authentik/pull/7117 and https://github.com/goauthentik/authentik/pull/7097.
Testing image available at https://hub.docker.com/repository/docker/thegrandpkizzle/authentik/tags?page=1&ordering=last_updated Please report any issues especially when using a non-standard Redis configuration (sentinel, cluster).
@PKizzle Looks like its failing on a sql migration... https://gist.github.com/kashalls/b0021ee1e9909d055711710b54d7abee
Just double-confirmed it now. I migrated back to prod authentik and it synced up just fine. Went back to the latest container you published and its stuck on pg deadlock.
@PKizzle Okay so weird turn of events. I left it alone and went for some lunch. It eventually came backup.
However any attempt at logging in returns an error with the request id: 0e2e114a695c4c3fade06a84fae64872
I've updated this gist with a new log attached.
The errors in the first log file are related to dce913496e861c26062a469aa63cfea8d4ebee16 that I have merged from main into this branch in 3b68053c2d0807e1c22055f69d3a51e04d2096e6. Maybe @BeryJu wants to have a look at it. The migration seems to deadlock itself. @kashalls could you please open a separate issue to keep track of this?
The second log file seems to hint at a failed migration where ALTER TABLE otp_static_staticdevice RENAME TO authentik_stages_authenticator_static_staticdevice;
has not been executed successfully.
That error should be fixed in the latest beta since https://github.com/goauthentik/authentik/pull/7185
I have pushed new images that contain the mentioned fix and removed the tags which resulted in the failed migration errors.
@PKizzle Still seeing this issue https://gist.github.com/kashalls/b0021ee1e9909d055711710b54d7abee#file-authentik-log-2
Server did not complete the migration.
Looks like worker did complete the migration though.
It seems to be running migrations every single time the containers run.
Rebooting the container did end up bringing it up, but now I am seeing You can't write against a read only replica.
(when getting it the host of the redis master sentinel).
Still trying to get it to fully load on the authentik-server... Will post when or if I get it working
Alright, so after 16 minutes of it cycling itself I got to this point. https://gist.github.com/kashalls/b0021ee1e9909d055711710b54d7abee#file-authentik-log-3
This may just be a me issue at this point, but I am passing the redis sentinel master host in and its trying to write to the read only node.
Edit: Looks like a upstream issue: https://github.com/redis/redis-py/issues/626
Let's see whether I can patch redis-py to use to correct command
@kashalls I have updated the images with an updated master discovery implementation for Redis sentinel
I've been trying to test 2023.8.3-abc2229
but I keep getting deadlocks.
Operations to perform:
Target specific migration: 0009_auto_20190420_0723, from authentik_stages_authenticator_totp
Running migrations:
Applying authentik_stages_authenticator_totp.0009_auto_20190420_0723... FAKED
2023-10-17 18:58:47 [info ] Migration finished applying migration=otp_merge.py
2023-10-17 18:58:47 [info ] applying django migrations
2023-10-17 18:58:47 [info ] waiting to acquire database lock
Traceback (most recent call last):
File "/lifecycle/migrate.py", line 103, in <module>
wait_for_lock()
File "/lifecycle/migrate.py", line 57, in wait_for_lock
I have been testing the images locally using an empty database and have not had any migration issues.
Since it tries to apply the same migration over and over again could you please directly check in your DB whether the operation has been performed successfully? What happens if you run the SQL statement manually?
Edit: In the django_migrations
table the authentik_stages_authenticator_totp.0009_auto_20190420_0723
migration might be marked as faked (marked as having happened although it was never applied). You could try to remove the entry and test again. Please note that I am no Django DB migration expert and can not help with any data loss caused by any of the actions above.
So I dropped the database for giggles (I have backups), and I am still seeing that its trying to write to the read-only replica.. https://gist.github.com/kashalls/b0021ee1e9909d055711710b54d7abee#file-2023-8-3-abc2229-log
Checking db3, shows that there are keys for celery tasks so it is getting the master some of the times.
So you can not find any keys with the "authentik_channels_" prefix?
So you can not find any keys with the "authentik_channels_" prefix?
I missed it when I was scanning the keys, I do see authentik_channels_:group:group_outpost_uuid
.
I try to replicate the problem locally. Your setup is one Redis master + read only replica + sentinel, correct?
I try to replicate the problem locally. Your setup is one Redis master + read only replica + sentinel, correct?
Running redis sentinel, 3 nodes each with 1 redis + 1 sentinel. Master gets elected by sentinel. This is the redis chart I am running: https://github.com/bitnami/charts/tree/main/bitnami/redis
Here is my configuration as code: Authentik
- externalsecret.yaml holds my env details
- helmrelease holds the chart and configuration
Redis for redis chart configuration details
@PKizzle So I did a hard reboot of the entire cluster ( was preforming a os upgrade ). Flushed the cache and everything "seems" to be working fine. Later tonight I will mess with it to see if its stable
This is the problem I have with the Python Redis implementation: It works under ideal circumstances but as soon as i.e. the cluster is still initializing nothing works as expected. I wonder how other projects can rely on redis-py in its current state.
This is the problem I have with the Python Redis implementation: It works under ideal circumstances but as soon as i.e. the cluster is still initializing nothing works as expected. I wonder how other projects can rely on redis-py in its current state.
When I was rebooting, I noticed that authentik waited a little bit for the redis instances to come up and configure themselves. Question is does it support changing masters?
In theory it should. However, considering the frankenstein code other projects are using as wrapper around redis-py I can not tell whether it works reliably. Please try and report any issues you encounter.
In theory it should. However, considering the frankenstein code other projects are using as wrapper around redis-py I can not tell whether it works reliably. Please try and report any issues you encounter.
~~I tried really hard to get it to crash on my end. I think because I had my master timeout so low that it failed over quickly and I only had one or two errors about connecting and it reconnected.~~
I take it back. Its throwing the You can't write against a read only replica
again after node-2
pulled master from node-0
. I am starting to think that the redis maintainers don't know how their projects function considering that I can't use RedisInsight-v2 to visually look at the keys (it fails when connecting to sentinel, but cli works just fine?)
Could you create another gist with the log file when changing masters?
Could you create another gist with the log file when changing masters?
@PKizzle I can't get you one from the server, struggling to recreate it. Here's what shows up on the worker side: https://gist.github.com/kashalls/0f75b0446e933deed675f8f231be089e#file-authelia-worker-abc2229
@PKizzle Was there any more progress on this?
@kashalls Fixing the ReadOnly error requires manually checking each existing connection for a master change as it is not automatically handled by redis-py. As a quick fix I am now creating new connections for each command send to Redis.
The suggestion in https://github.com/redis/redis-py/issues/688 to handle the ReadOnly error outside of redis-py shows how much basic functionality is still missing in the project. The follow-up issue https://github.com/redis/redis-py/issues/935 was not even answered.