graylog2-web-interface
graylog2-web-interface copied to clipboard
Active directory auth fails on first attempt, succeeds on second
We noticed that after enabling AD authentication on our Graylog 1.2.2 the first attempt was always unsuccessful, while the second was always successful.
So I did a packet dump during the logins to try and narrow it down. This is what happens:
Packet dump started User try to login 1st time Login fail Wait for 20 seconds Packet dump stopped
New packet dump started User try to login 2nd time Login successfull Wait 20 seconds Packet dump stopped
When checking the packet dump I can see that after the 1st login, no attempt to communicate with LDAP servers were made. When 2nd login attempt was done, I could see LDAP communication. I can successful replicate this issue every time by logging out user and attempt to login again.
What debugs and logs could be useful here? Is this a known issue?
We have seen reports of this but were unable to reproduce the issue so far. Does the failed attempt take 2 seconds to fail or does it fail immediately?
I tested now. I counted to about 3 seconds to fail. Another observation is that it doesn't matter what you put as password the first time, as long as you put in the correct password the second time.
Ok, this sounds like a problem with the connection pool. I'll take a look if I see anything obvious.
Thanks. Let me know if you want me to run any tests on my system, since I can replicate the issue every time.
Can you please check what the output in the server log is when you set the log level of org.apache.directory.ldap.client.api.LdapNetworkConnection to DEBUG?
For how to do that please see: http://docs.graylog.org/en/1.2/pages/installation/manual_setup.html?highlight=external#supplying-external-logging-configuration or you could use the REST API browser.
Ok so I edited /opt/graylog/conf/log4j.xml and appended
<logger name="rg.apache.directory.ldap.client.api.LdapNetworkConnection">
<level value="debug"/>
</logger>
Then I did a graylog-ctl restart I then replicated the issue a few times. graylog-ldap.txt
Thanks!
This looks like the connection simply times out the first time, I wonder what it does all that time. Can you play with the LDAP connection timeout setting in graylog.conf:
ldap_connection_timeout = 5000 # default is 2000 milliseconds
Increasing it should make the problem go away, apparently the connection to the server takes quite some time. If that's successfull we'll increase the default timeout which should solve the issue for most people.
No difference. Now it takes 5 seconds for it to fail the first time. Remember that I dumped the traffic on the first attempt, and it didnt even try to connect to any LDAP server until the second attempt.
when the first failure to connect happens, is there any line in the server log that say Couldn't connect to LDAP directory?
Do you have the same problem when testing the LDAP configuration? With both "Test connection" and "Test login"? Those parts use the exact same code as the login itself.
So here is a text file which is only the log for a failed attempt graylog-ldap.txt
Testing "test connection" and "test login" works fine when logged in.
I simply cannot reproduce this locally. For me the first request simply succeeds and I cannot even see any way how a second connection is any different from the first.
Is there anything special about your setup? High latency to AD, loadbalancer in front etc?
Yeah this is super weird
So lets see how the work flow is
I go to graylog and I get this login box
After a few seconds I get this box
Which after a few more seconds changes to this box

In the log we can see 2015-12-09_11:09:26.75305 ERROR [LdapConnector] Timed out connecting to LDAP server 2015-12-09_11:09:26.75656 ERROR [LdapUserAuthenticator] LDAP error 2015-12-09_11:09:26.78837 WARN [SessionsResource] Unable to log in user jobn 2015-12-09_11:09:26.81711 DEBUG [LdapNetworkConnection] ------>> Cannot get the connection... Retrying 2015-12-09_11:09:26.84720 DEBUG [LdapNetworkConnection] Interrupted while waiting for connection to establish with server XXXX.local:389
However if we check the packet capture we took on the graylog host
So we see traffic to and from the graylog server. Lets now filter on all traffic which has TCP port 389
Nothing
If I compare with a packet capture during second and successful login attempt take in the same way

Ok so we know that if no connection is made, could it be DNS?
So in our unsuccessfull attempt we see that graylog is resolving DNS in the exact same way as the successful.
Hi,
I am not sure if this is relevant at all, but I just ran into a similar issue. I thought I would post my findings as this was really odd to me as well.
What happened was that I configured my LDAP config (userfilter and groupfilter) with something like:
ldap.groupFilter= cn=groups,cn=accounts,dc=test,dc=com
Other properties accordingly - not important.
What happened was that each first attempt returned unauthenticated, while each subsquent attempt does authenticate.
In the backgroup, my configuration file used apache PropertySubstitutor. This particular class uses a Property resolver that reads , signs as separators, hence the group filter went from being a string, to being an array of strings (each , separator is 1 entry). So the configuration just didn't work, which is why the initial authentication failed. (I do not know why every other authentication worked .. it was kind of self healing). I only realised this after doing the config manually in a little test program and realising that it did in fact work.
I thought I'd post this, as this could be why one of you can't reproduce it, while it behaves this strangely for others.
Of course I don't know if that's still relevant or if I'm even making sense in this case :) google lead me here, i thought maybe this could help, if not please ignore me :)
Regards, Artur
Hi,
I have the same issue. Users have to try 2 attempts to log in. Same result as @jontaa
Same here. I just updated to 2.1.1 but unfortunately it still happens.
Hi Guys, This can be caused by any recent Microsoft update which might have tickled .net frame work settings, in our case we went through series of log collection nothing helped us, but cloning the PC and upgrading . Net frame work to version 4 helped us to fix the problem.