keycloak icon indicating copy to clipboard operation
keycloak copied to clipboard

Federated users being deleted when LDAP Connection drops/times out

Open zts3y opened this issue 2 years ago • 14 comments

Before reporting an issue

  • [X] I have read and understood the above terms for submitting issues, and I understand that my issue may be closed without action if I do not follow them.

Area

ldap

Describe the bug

This is similar (identical?) to other bugs i found that were stated as being fixed in earlier versions of Keycloak. It has been observed that if the LDAP Query fails, users are lost at that time. It's unclear to me at this point how it's determine which rows are dropped at that time.

Attaching a log from a localhost connected to an external LDAP via a tunnel.

KeycloakLogs.txt

Version

21.1.1

Expected behavior

The federated users should retain and fail gracefully if the query to LDAP fails.

Actual behavior

When there is a network connectivity issue during a moment when Keycloak is attempting to make a query to LDAP, users are deleted completely from USER_ENTITY table and no mention in the keycloak logs of the deletion.

How to Reproduce?

Interrupt network connectivity during a LDAP query

Anything else?

No response

zts3y avatar Nov 09 '23 16:11 zts3y

#21359 #12257 #9520 are the other tickets I found with same behavior, but those are closed and some state it was fixed in old versions

zts3y avatar Nov 09 '23 18:11 zts3y

This is a critical issue. Users must not be automatically deleted in the background without any notification even in the log. We have reproduced this issue several times with Keycloak 15 and also with Keycloak 22.0.5. Here are the instructions how to reproduce it:

Starting point: LDAP configured and several users synchronized to Keycloak. LDAP connection works fine.

  • Change any user related attribute, e.g., "UUID LDAP attribute" to incorrect value and save the change.
  • Click "Sync all users" from the Action-dropdown. Alternatively wait for the next periodic sync (if enabled).
  • Check the user list: All LDAP users are now deleted from Keycloak side.

When you change the changed attribute (e.g., "UUID LDAP attribute") back to the original value and do the sync again, the users will reappear but with different user id's so e.g. group and role information will be lost.

Also temporary outages in the LDAP connection cause the same issue. In that case LDAP attributes were not touched, but the users are silently deleted.

This bug has been there for really long time. Would it be possible to get a quick fix for it?

vkorpi avatar Nov 14 '23 12:11 vkorpi

@vkorpi I've tried to reproduce the network outage scenario; the user gets deleted and then recreated with the original ID and group / role membership. Is it the same for you? Looks like two separate but related issues.

dteleguin avatar Nov 25 '23 01:11 dteleguin

@dteleguin In my case the user ID got changed (ID field in Keycloak which has UUID value). Also the group memberships and roles got deleted. You can reproduce the issue for LDAP user test_ldap_user in the following way:

  • Assign test_ldap_user to at least one group in Keycloak side (the group is created in Keycloak, it's not coming from LDAP)
  • Assign one or more roles to test_ldap_user. The roles are created in Keycloak, they are not coming from LDAP.
  • Save ID (UUID) of test_ldap_user somewhere.
  • Go to LDAP user federation settings and change value of "UUID LDAP attribute" to incorrect value. Then click "Save".
  • Try to search the user test_ldap_user --> No users found and Keycloak complains about LDAP error. The user is deleted from Keycloak DB in the background.
  • Go to LDAP user federation settings and change value of "UUID LDAP attribute" back to the correct value. Then click "Save".
  • Try to search the user test_ldap_user again --> Now the user is found (and re-created in the background).
  • Check ID (UUID) of the user. It's now different when comparing to the old ID. Check also group memberships and roles of the user: They are now lost.

I reproduced this today by using Keycloak 22.0.5.

vkorpi avatar Nov 27 '23 07:11 vkorpi

@vkorpi I have been able to reproduce this issue too. However, with network outage it seems to be different. Here's what I've tried:

  • set up OpenLDAP, created schema and users
  • configured LDAP federation in KC
  • configured short full-sync period (~5sec) and short timeout (~1sec)
  • suspended OpenLDAP with killall -STOP slapd
    • alternatively, configure iptables to drop LDAP packets: iptables -I INPUT --protocol tcp --dport 389 -j DROP
  • synchronization now times out, the user disappears from the UI, various LDAP-related errors are shown
  • still, the user is NOT removed from the KC DB
  • then, OpenLDAP is resumed with killall -CONT slapd
    • alternatively, remove iptables rule: iptables -D INPUT --protocol tcp --dport 389 -j DROP
  • the user reappears in the UI and no role/group membership info is lost.

This is why I thought it was a different issue. Is it the same for you?

dteleguin avatar Nov 28 '23 00:11 dteleguin

@dteleguin I noticed the same behavior when shutting down our OpenLDAP test server. User is not in the user list and login fails (which is expected). After restarting the LDAP server, the user comes back to Keycloak (no change to user ID, also assigned groups and roles are not lost). So, this scenario works fine.

However, the other scenario is not working (incorrect LDAP configuration leads to user deletion and user ID change). Something similar can also happen when LDAP side configuration changes and Keycloak still uses the old configuration.

vkorpi avatar Nov 28 '23 11:11 vkorpi

Thank you for reporting this issue. I saw a few similar reports already so it seems there is indeed something going on that needs our attention.

Whenever Keycloak returns a user that originates from LDAP it always checks the validity, therefore some failure is expected in case LDAP is going unreachable. On the other hand, I admit losing roles and groups is unfortunate and we should try to avoid it.

Now I see two steps to reproduce within this report. The first one from @vkorpi:

  • Assign test_ldap_user to at least one group in Keycloak side (the group is created in Keycloak, it's not coming from LDAP)
  • Assign one or more roles to test_ldap_user. The roles are created in Keycloak, they are not coming from LDAP.
  • Save ID (UUID) of test_ldap_user somewhere.
  • Go to LDAP user federation settings and change value of "UUID LDAP attribute" to incorrect value. Then click "Save".
  • Try to search the user test_ldap_user --> No users found and Keycloak complains about LDAP error. The user is deleted from Keycloak DB in the background.
  • Go to LDAP user federation settings and change value of "UUID LDAP attribute" back to the correct value. Then click "Save".
  • Try to search the user test_ldap_user again --> Now the user is found (and re-created in the background).
  • Check ID (UUID) of the user. It's now different when comparing to the old ID. Check also group memberships and roles of the user: They are now lost.

To me this seems like an expected behavior, LDAP server is reachable and the user that Keycloak knows does not match any user in LDAP so I would say it is expected to remove such user.

However, the second one from @dteleguin seems like an unwanted behavior however, I was not able to reproduce.

  • set up OpenLDAP, created schema and users
  • configured LDAP federation in KC
  • configured short full-sync period (~5sec) and short timeout (~1sec)
  • suspended OpenLDAP with killall -STOP slapd
    • alternatively, configure iptables to drop LDAP packets: iptables -I INPUT --protocol tcp --dport 389 -j DROP
  • synchronization now times out, the user disappears from the UI, various LDAP-related errors are shown
  • still, the user is NOT removed from the KC DB
  • then, OpenLDAP is resumed with killall -CONT slapd
    • alternatively, remove iptables rule: iptables -D INPUT --protocol tcp --dport 389 -j DROP
  • the user reappears in the UI and no role/group membership info is lost.

I tried to stop LDAP and with each failed request I see org.keycloak.models.ModelException: LDAP Query failed exception. In this case, users are not removed; when I start the LDAP again the users still have their roles. See this commit for a reproducer I have created: https://github.com/mhajas/keycloak/commit/1ad678158425186c1fb758d5e10e351432159fb0

You can run the test with the following command:

./mvnw -f testsuite/model test -Pjpa-federation+ldap+infinispan -Dtest=UserSyncTest#testSyncUsersWhenLDAPGoesDown

Currently, I don't see this as a blocker for Keycloak 24 as this is probably not something new or a regression in a recent version. Please let me know if this worked before in some version. Therefore, I am moving this to backlog.

mhajas avatar Feb 12 '24 12:02 mhajas

Hi @mhajas,

thank you for summing up and retesting the different issues, that have been reported here. I want to add another case where I'm not sure if it is expected behavior. From my personal point of view, it is not.

If LDAP is used in Edit Mode set to UNSYNCED and Import Users is set to on, I would not expect any user to be hidden/deleted when the LDAP-Server is (temporary) unreachable at all.

We add custom attributes to users in Keycloak using the API and any API request fails when LDAP is unreachable.

I understand and agree that any login-attempt has to fail. But I honestly don't understand why users get hidden/deleted at any time.

In fact, in the current implementation, even users that got deleted in LDAP never get deleted in Keycloak. They simply don't get any updates anymore during sync and of course the can't login.

So why does Keycloak delete/hide users on (temporary) outages or in any other circumstance if it does not delete during normal operation?

noraab avatar Feb 13 '24 08:02 noraab

Hello @mhajas,

I also think that this is a critical issue. There should at least be an option like "Don't delete users from Keycloak" in LDAP configuration for making sure that the users don't just disappear when something goes wrong. It's fine if login doesn't work in that case, but users should not be deleted in uncontrolled way.

In the case of a big organization, something might change in LDAP configuration which causes configuration changes or connection losses. It should be possible to investigate the issue and fix it without losing the users from Keycloak in between.

vkorpi avatar Feb 13 '24 12:02 vkorpi

LDAPQuery.java return null List in both the cases when LDAPQuery fails and as well when the user is not present in the federation. Based on this null check, a deleteInvalidUser is called to remove the user from keycloak. This needs to be changed such that the deleteInvalidUser method is called only when the user is not present in the LDAP but not when there is an exception.

Snippet from UserStorageManager.java

UserStorageProviderModel model = getStorageProviderModel(realm, user.getFederationLink());
        if (model == null) {
            // remove linked user with unknown storage provider.
            logger.debugf("Removed user with federation link of unknown storage provider '%s'", user.getUsername());
            deleteInvalidUser(realm, user);
            return null;
        }

Snippet from LDAPQuery.java

public List<LDAPObject> getResultList() {
        // Apply mappers now
        LDAPMappersComparator ldapMappersComparator = new LDAPMappersComparator(ldapFedProvider.getLdapIdentityStore().getConfig());
        Collections.sort(mappers, ldapMappersComparator.sortAsc());

        for (ComponentModel mapperModel : mappers) {
            LDAPStorageMapper fedMapper = ldapFedProvider.getMapperManager().getMapper(mapperModel);
            fedMapper.beforeLDAPQuery(this);
        }

        List<LDAPObject> result = new ArrayList<>();

        try {
            for (LDAPObject ldapObject : ldapFedProvider.getLdapIdentityStore().fetchQueryResults(this)) {
                result.add(ldapObject);
            }
        } catch (Exception e) {
            throw new ModelException("LDAP Query failed", e);
        }

        return result;
    }

pravsjv avatar Apr 23 '24 16:04 pravsjv

After some research I can state that we have two separate issues here:

  1. An exception is thrown during LDAP query. This could be caused by network disruption or misconfigured LDAP parameters (except UUID LDAP Attribute and User LDAP Filter). The exception is not caught anywhere in Keycloak, so it will be returned to the requestor as a 500 Internal Server Error. In this case, UserStorageManager::deleteInvalidUser will NOT be called, therefore, no users will be deleted. Instead, the users will disappear from the UI until connectivity is restored and/or configuration is fixed. User IDs will remain the same, role and group membership will not be lost.
  2. An empty result set is returned by LDAP query for a particular user or users. This could be caused by either the user(s) being actually deleted from LDAP, or misconfiguration of UUID LDAP Attribute and/or User LDAP Filter. In this case, LDAPQuery::getFirstResult will return null, causing UserStorageManager to silently delete the user. After the configuration is fixed, the users will be recreated with new IDs, and auxiliary data like custom attributes and role/group membership will be lost.

Interestingly, the second issue affects only searching/browsing users via the Admin UI; periodic and manual sync are not affected. (The code flow is different and does not involve UserStorageManager::importValidation.) This could be easily tested: change UUID LDAP Attribute to a bogus value - perform full sync - restore UUID LDAP Attribute - perform full sync again, then check the users, no info will be lost.

Also, I can confirm that the issue exists for all KC versions from 21 to 24 (included).

dteleguin avatar May 29 '24 22:05 dteleguin

Thanks @pravsjv for your investigation!

LDAPQuery.java return null List in both the cases when LDAPQuery fails and as well when the user is not present in the federation.

Luckily, it is not that bad. When LDAPQuery fails, an exception will be thrown, which will prevent user deletion and will result in a 500 Internal Server Error. However, if the query returns an empty set for whatever reason, the user will be deleted from KC.

dteleguin avatar May 29 '24 22:05 dteleguin

Is there a way to reverse priority in this case? For example, I would like to sync all my users from LDAP to Keycloak. I do need to keep LDAP synced with Keycloak for services that require LDAP for authentication. Keycloak however should be treated as a main user database that LDAP syncs from. If I change the edit mode to UNSYNCED to prevent users being deleted/hidden when LDAP server is down, upon sync, data from LDAP overwrites Keycloak while I would want the opposite to happen.

muppeth avatar Aug 23 '24 06:08 muppeth

I think the summary here is (thanks to previous investigations on the matter):

  • We do not remove users from local storage if there are connectivity issues with LDAP or when running queries.
  • Users are removed from local if the LDAP query does not resolve to a corresponding LDAP entry. For instance, user is deleted. But it could also be because the provider is misconfigured.

The best I can think of is to provide a configuration option to decide whether users should be deleted from Keycloak when there is no mapping between a user in the local database and its corresponding LDAP entry. Isn't this how UNSYNCED should work anyways?

pedroigor avatar Nov 07 '24 15:11 pedroigor

The original issue is expected so that users in the local storage are deleted whenever their corresponding LDAP entries cannot be retrieved when executing queries. The reason is that the LDAP provider(s) have precedence over user in the local storage.

For this reason, I'm closing this issue.

However, for some use cases, disabling the removal of local users when they don't map to an entry in LDAP might make sense. For that, I created https://github.com/keycloak/keycloak/issues/34764.

Associated with this issue, we also improved the message when there are failures when querying the LDAP. See https://github.com/keycloak/keycloak/pull/34761.

pedroigor avatar Nov 07 '24 20:11 pedroigor

Not finding a user because it no longer exists in LDAP and syncing that change to Keycloak is very different from an actual network connectivity issue and wiping away thousands of users. I can absolutely agree LDAP should be the source of truth, but destructive actions due to network faults shouldn't be the expected outcome. Myself and others in the thread have shown that in at least some cases Keycloak has not caught the error in connectivity properly and user data was lost.

zts3y avatar Nov 07 '24 20:11 zts3y

Myself and others in the thread have shown that in at least some cases Keycloak has not caught the error in connectivity properly and user data was lost.

Unfortunately, we have also experienced that and turned off the removal in our fork for years (with a dirty hack). This led to many invalid users over the years, but it is still better than removing valid users during an Active Directory Update.

yanxch avatar Nov 08 '24 20:11 yanxch

We do not remove users from local storage if there are connectivity issues with LDAP or when running queries.

I think based on all the comments this does not seem to be true. Many people here are writing about instances where connectivity errors lead to users being deleted which should not happen. Maybe some errors are silenced along the way and converted into an empty response set?

septatrix avatar Nov 08 '24 23:11 septatrix

Perhaps what is missing here is more context about the LDAP provider configuration and the steps to reproduce. Sorry, if I missed any step but I could not reproduce the users being deleted when failing to connect to the LDAP service.

The LDAP provider settings are full of combinations to change how the LDAP provider behaves. Can you provide your configuration?

pedroigor avatar Nov 11 '24 14:11 pedroigor

We have an OpenLDAP proxy in front of an Active Directory (with 3 Pods). It's not in our ownership so I don't know much about the configuration of the LDAP itself.

KC Settings:

  • Vendor: Active Directory
  • Use Truststore SPI: ldapsOnly
  • Connection Pooling: On
  • Connection Timeout: 35000
  • Bind type: simple
  • Edit Mode: UNSYNCED
  • Search Scope: Subtree
  • Pagination: On
  • Import Users: On
  • Sync Registrations: Off
  • Batch Size: 100
  • Periodic changed users sync
  • Changed users sync period: 86400
  • Cache Policy: DEFAULT
  • No Advanced Settings

@pedroigor imo it's like chasing a ghost, and I was fine that you closed this issue. I'll prepare a PR for #34764.

  1. Maybe we can find a way to log the LDAP responses (I have not looked into this yet)

  2. This is a totally different idea: Would it be feasible to create an option to delete an invalid user after a given amount of time?

yanxch avatar Nov 14 '24 15:11 yanxch

@yanxch Awesome! Thanks for your time on https://github.com/keycloak/keycloak/issues/34764.

  1. Maybe we can find a way to log the LDAP responses (I have not looked into this yet)
  2. This is a totally different idea: Would it be feasible to create an option to delete an invalid user after a given amount of time?

I'm not sure about both. IMO, we can start with your changes to avoid removing users when they can not be queries in LDAP. I'm afraid we can end up with lot of settings if we don't think more about the use cases where you want to handle invalid users/queries from LDAP.

I also agree we can close this issue because we can't do much but start with your proposal in https://github.com/keycloak/keycloak/issues/34764.

pedroigor avatar Nov 21 '24 20:11 pedroigor

Thanks for reporting this issue. However, after review this is not considered a valid issue, or has been recently resolved.

As the issue is not valid it will be automatically closed.