ocis icon indicating copy to clipboard operation
ocis copied to clipboard

Error: ldap identifier backend logon connect error: LDAP Result Code 200 "Network Error": tls: failed to verify certificate

Open nikslor opened this issue 1 year ago • 6 comments

Describe the bug

ocis 5.0rc4 (and earlier versions too) starts but doesn't work and shows the following message in the logs when I try to log in:

{"level":"error","service":"idp","error":"ldap identifier backend logon connect error: LDAP Result Code 200 \"Network Error\": tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2024-02-28T17:31:56+01:00 is after 2024-01-29T20:04:27Z","time":"2024-02-28T17:31:56+01:00","message":"identifier failed to logon with backend"}

I don't know exactly how I got into this situation. After a hardware failure, this instance was down for a few weeks - probably from before 2024-01-29 to after 2024-01-29.

Deleting the following files and restarting ocis fixed the problem:

  • /var/lib/ocis/idm -> delete ldap.crt and ldap.key
  • /var/lib/ocis/idp -> delete encryption.key and private-key.pem

Expected behavior

As far as I can see, the certs in /var/lib/ocis/idm and /var/lib/ocis/idp are automatically copied / generated, so if they are outdated, ocis should probably do one of the following things: a) refuse to start and give a proper error message b) copy/generate new versions of the files (the same way it was done originally)

Actual behavior

ocis starts but doesn't work properly, the administrator has to debug and find the solution on their own.

Setup

systemd based instance, with the following config:

OCIS_BASE_DATA_PATH=/var/lib/ocis ACCOUNTS_DEMO_USERS_AND_GROUPS=false PROXY_HTTP_ADDR=0.0.0.0:443 OCIS_URL=https://foo.bar.com PROXY_TRANSPORT_TLS_KEY=/etc/letsencrypt/live/foo.bar.com/privkey.pem PROXY_TRANSPORT_TLS_CERT=/etc/letsencrypt/live/foo.bar.com/fullchain.pem OCIS_INSECURE=false PROXY_ENABLE_BASIC_AUTH=true

nikslor avatar Mar 01 '24 06:03 nikslor

@rhafer @dragonchaser @butonic what is your opinion on that? Do you agree with the expected behavior?

micbar avatar Mar 01 '24 07:03 micbar

Today I had the same issue. Simply renaming / removing the ldap.key and ldap.crt forces regeneration and everything is working fine again.

iFrozenPhoenix avatar Mar 22 '24 07:03 iFrozenPhoenix

Had this issue today on 5.0.1. Followed the steps from the first post, now it works again. (running in Docker, behind Traefik, and using Keycloak as IdP)

tkintscher avatar Apr 11 '24 08:04 tkintscher

@micbar I agree,but I would not automatically update the certs, I'd opt for option (a) (refuse to start).

dragonchaser avatar Apr 17 '24 12:04 dragonchaser

Refusing to start with the expired certificate is ok I guess. One could also argue that it is ok to just accept the expired certificate (it insecure anyway since it is self-signed, and probably issued for the wrong subject).

Also it raises the question what we should do at runtime, when the certificate expires? Exit with an error? Or just continue to run an log error?

Somehow I think the real issue here is that we're enforcing SSL for LDAP even when the server (libregraph/idm) is just listening on the loopback interface, which is the case in the single binary setup. I guess it would be ok to allow unencrypted LDAP in that case.

rhafer avatar Apr 17 '24 13:04 rhafer

@dragonchaser why would you not regenerate it if the certs are expired and they are self certified? Not doing so raises the question why the certificates are generated at all at the first run if the app is not capable to renew it. Just to make the first run easy doesn't seems to be fair. I think either don't generate any certificate at all and require the user to provide the certs, at best from a public ca, or generate the certificates and manage them, i.e. regenerate it. I suspect 99 pct of the users (admins) don't even know that there are certs in the aio deployment until they run into this error. I would assume that this error will raise up in the next time because the first deployments now already run for a while.

iFrozenPhoenix avatar Apr 17 '24 13:04 iFrozenPhoenix

How can this bug still be present? Claiming OCIS is production ready, but expiring self-signed internal tls certificates require us to search through GitHub issues can not be acceptable.

7ritn avatar May 22 '24 16:05 7ritn

Well I guess because it's currently classified as an expected behavior. See comments above. It's also described how to renew it. If you want an automated way I guess owncloud is happy to support you with a subscription...

iFrozenPhoenix avatar May 22 '24 17:05 iFrozenPhoenix

If you want an automated way I guess owncloud is happy to support you with a subscription...

Nice try... But joking aside, cert management is a PITA since we have software

Self signed certificates are anyway a bit of "snakeoil".

I agree that the admin experience is somehow not nice. The broader topic is, that oCIS is not a monolith. If you think about a LAMP stack with a DB, in the "old days" we didn't connect to the DB via TLS. Which was not secure. If an attacker could get access to the internal network, reading the unencrypted data stream would have been possible.

We decided that ocis should be "secure by default" and tried to make the initial setup as easy as possible.

I would agree that we can follow the advice from @rhafer and use no LDAPs if the ldap server is running on the loopback interface.

micbar avatar May 27 '24 16:05 micbar

How can this bug still be present? Claiming OCIS is production ready, but expiring self-signed internal tls certificates require us to search through GitHub issues can not be acceptable.

This is an opensource project, feel free to contribute.

dragonchaser avatar Jun 04 '24 13:06 dragonchaser

I think it would make contributing on this issue a lot easier if the team could state the expected behavior.

I would agree that we can follow the advice from @rhafer and use no LDAPs if the ldap server is running on the loopback interface.

Is that the desired solution?

kaivol avatar Jun 06 '24 15:06 kaivol

I think it would make contributing on this issue a lot easier if the team could state the expected behavior.

I would agree that we can follow the advice from @rhafer and use no LDAPs if the ldap server is running on the loopback interface.

Is that the desired solution?

Yes. That is the desired solution. Do not use TLS when LDAP is on localhost.

micbar avatar Jun 19 '24 21:06 micbar