bonsai icon indicating copy to clipboard operation
bonsai copied to clipboard

Casting an LDAPSearchIter from paged_search to a list at the wrong time causes [WinError 10054]

Open gpregger-ethz opened this issue 8 months ago • 8 comments

Hi

I'm writing a small directory utility and was stuck at a peculiar issue after I switched from LDAPConnection.search to LDAPConnection.paged_search.

For this I'm only interested in reading a limited number of entries, so simply cast the search results to a list, but my script would just terminate without explanation. No exception to catch, no error message at all.

While playing around in the debugger I was eventually met with [WinError 10054] An existing connection was forcibly closed by the remote host.

I since spent some time trying to boil down the reason this happens and it seems to be connected to when I'm turning the LDAPSearchIter into a list.

I can probably work around this now, but here is as minimal of an example as I could create to hopefully reproduce the issue:

This works and prints the found entries:

import bonsai
from pprint import pprint

URL = 'ldaps://ad.xyz.com:636'       # I'm querying an AD if it's relevant
BASE = 'dc=ad,dc=xyz,dc=com'
FILTER = '(&(cn={username})(objectClass=user))'
USER = user_login
PASS = user_password

def main():
    client = bonsai.LDAPClient(url=URL, tls=True)
    client.set_credentials(mechanism='SIMPLE', user=USER, password=PASS)
    client.auto_page_acquire = False

    response = custom_paged_search(client)
    results = list(response)

    pprint(results)

def custom_paged_search(client):
    with client.connect() as conn:
        response = conn.paged_search(
            base=BASE,
            scope=bonsai.LDAPSearchScope.SUBTREE,
            filter_exp=FILTER
        )
        return response


if __name__ == '__main__':
    main()

And this does not work, the script exits without apparent error, with the only difference being the combined function call / list cast on line 15:

import bonsai
from pprint import pprint

URL = 'ldaps://ad.xyz.com:636'       # I'm querying an AD if it's relevant
BASE = 'dc=ad,dc=xyz,dc=com'
FILTER = '(&(cn={username})(objectClass=user))'
USER = user_login
PASS = user_password

def main():
    client = bonsai.LDAPClient(url=URL, tls=True)
    client.set_credentials(mechanism='SIMPLE', user=USER, password=PASS)
    client.auto_page_acquire = False

    results = list(custom_paged_search(client))

    pprint(results)

def custom_paged_search(client):
    with client.connect() as conn:
        response = conn.paged_search(
            base=BASE,
            scope=bonsai.LDAPSearchScope.SUBTREE,
            filter_exp=FILTER
        )
        return response


if __name__ == '__main__':
    main()

Is this a bug in bonsai or maybe a gap in my understanding of the situation?

gpregger-ethz avatar Jun 03 '25 07:06 gpregger-ethz

Update: Not sure if that's how it's intended but if I add response.acquire_next_page() in the custom_paged_search function right before the return, the issue seems to go away and I still only get the first page of results in the response, just how I wanted.

Feel free to close this issue if this is intended behavior. A brief explanation would be appreciated :)

gpregger-ethz avatar Jun 06 '25 08:06 gpregger-ethz

This definitely looks odd. Could you please send me the debug logs for the three different scripts? That might help figuring out what's going on. You can turn it on with bonsai.set_debug(True) right after the module import.

noirello avatar Jun 10 '25 22:06 noirello

Here is my test script with the 3 test cases:

import bonsai
bonsai.set_debug(True)
from pprint import pprint

URL = 'ldaps://ad.xyz.com:636'
BASE = 'dc=ad,dc=xyz,dc=com'
FILTER = f'(&(cn={username})(objectClass=user))'
USER = username
PASS = password

DEBUG_VARIANT = 'combined_cast'                 # This doesn't work
#DEBUG_VARIANT = 'separate_cast'                # This works here, but is unreliable in my project where the call stack is a lot deeper
#DEBUG_VARIANT = 'explicit_acquire_next_page'   # This so far works well

def main():
    client = bonsai.LDAPClient(url=URL, tls=True)
    client.set_credentials(mechanism='SIMPLE', user=USER, password=PASS)
    client.auto_page_acquire = False

    match DEBUG_VARIANT:
        case 'combined_cast':
            results = list(custom_paged_search(client))
        case 'separate_cast':
            response = custom_paged_search(client)
            results = list(response)
        case 'explicit_acquire_next_page':
            results = list(custom_paged_search_fixed(client))

    pprint(results)

def custom_paged_search(client):
    with client.connect() as conn:
        response = conn.paged_search(
            base=BASE,
            scope=bonsai.LDAPSearchScope.SUBTREE,
            filter_exp=FILTER,
            attrlist=['cn']
        )
        return response

def custom_paged_search_fixed(client):
    with client.connect() as conn:
        response = conn.paged_search(
            base=BASE,
            scope=bonsai.LDAPSearchScope.SUBTREE,
            filter_exp=FILTER,
            attrlist=['cn']
        )
        response.acquire_next_page()
        return response


if __name__ == '__main__':
    main()

And here the debug outputs:

  • For DEBUG_VARIANT = 'combined_cast' (no pprint in stdout)
DBG: ldapconnection_new [self:0000023204DEAB70]                                                           s\x5cbonsai_paged_search\x5cmain.py' 
DBG: ldapconnection_init (self:0000023204DEAB70)
DBG: ldapconnection_open (self:0000023204DEAB70)
DBG: connecting (self:0000023204DEAB70)
DBG: create_conn_info (mech:SIMPLE, sock:-1, creds:00000232058F6CC0)
DBG: ldapconnectiter_new [self:0000023205928260]
DBG: create_init_thread_data (client:0000023204D1B4D0, sock:-1)
DBG: create_init_thread (ld:00000232020D8780, info:00000232020D4960, thread:0)
DBG: ldapconnection_result (self:0000023204DEAB70, args:000002320591DC00, kwds:0000000000000000)[msgid:-1]
DBG: LDAPConnection_Result (self:0000023204DEAB70, msgid:-1, millisec:-1)
DBG: LDAPConnectIter_Next (self:0000023205928260, timeout:-1) [tls:0, state:0]
DBG: ldap_init_thread_func (params:00000232020D8780)
DBG: _ldap_finish_init_thread (async:0, thread:0000000000000164, timeout:-1, misc:00000232020D8780)
DBG: ldap_init_thread_func [retval:0]
DBG: binding [state:3]
DBG: _ldap_bind (ld:00000232045F82D8, info:00000232020D4960, ppolicy:0, result:0000000000000000, msgid:0)
DBG: LDAPConnectIter_Next (self:0000023205928260, timeout:-1) [tls:0, state:4]
DBG: binding [state:4]
DBG: ldap_thread_bind (params:00000232020D4960)
DBG: ldapconnectiter_dealloc (self:0000023205928260)
DBG: dealloc_conn_info (info:00000232020D4960)
DBG: ldapconnection_search (self:0000023204DEAB70, args:00000232058F91C0, kwds:0000000000000000)
DBG: LDAPConnection_IsClosed (self:0000023204DEAB70)
DBG: ldapsearchiter_new [self:0000023205928800]
DBG: LDAPSearchIter_New (conn:0000023204DEAB70)[self:0000023205928800]
DBG: LDAPConnection_Searching (self:0000023204DEAB70, params_in:00000085A5DEEA10, iterator:0000023205928800)
DBG: ldapconnection_result (self:0000023204DEAB70, args:000002320591DC00, kwds:0000000000000000)[msgid:2]
DBG: LDAPConnection_Result (self:0000023204DEAB70, msgid:2, millisec:-1)
DBG: parse_search_result (self:0000023204DEAB70, res:00000232046E2878, obj:0000023205928800)
DBG: LDAPEntry_FromLDAPMessage (entrymsg:00000232046E2878, conn:0000023204DEAB70)[dn:CN=gpregger,DC=ad,DC=xyz,DC=com]
DBG: ldapentry_new [self:00000232058A3EE0]
DBG: ldapentry_init (self:00000232058A3EE0)
DBG: LDAPEntry_SetDN (self:00000232058A3EE0, value:00000232058D78D0)
DBG: ldapconnection_close (self:0000023204DEAB70)
DBG: ldapsearchiter_iternext (self:0000023205928800)
DBG: ldapsearchiter_iternext (self:0000023205928800)
DBG: ldapsearchiter_dealloc (self:0000023205928800)
  • For DEBUG_VARIANT = 'separate_cast' (pprint in stdout, but in production script very inconsistent, so not a viable workaround)
DBG: ldapconnection_new [self:0000024191DDAB70]
DBG: ldapconnection_init (self:0000024191DDAB70)
DBG: ldapconnection_open (self:0000024191DDAB70)
DBG: connecting (self:0000024191DDAB70)
DBG: create_conn_info (mech:SIMPLE, sock:-1, creds:00000241928A6D00)
DBG: ldapconnectiter_new [self:00000241928D82B0]
DBG: create_init_thread_data (client:000002418F1EBBC0, sock:-1)
DBG: create_init_thread (ld:0000024191865980, info:000002418F12CC70, thread:0)
DBG: ldapconnection_result (self:0000024191DDAB70, args:00000241928CDC80, kwds:0000000000000000)[msgid:-1]
DBG: LDAPConnection_Result (self:0000024191DDAB70, msgid:-1, millisec:-1)
DBG: LDAPConnectIter_Next (self:00000241928D82B0, timeout:-1) [tls:0, state:0]
DBG: ldap_init_thread_func (params:0000024191865980)
DBG: _ldap_finish_init_thread (async:0, thread:00000000000003DC, timeout:-1, misc:0000024191865980)
DBG: ldap_init_thread_func [retval:0]
DBG: binding [state:3]
DBG: _ldap_bind (ld:000002419162F348, info:000002418F12CC70, ppolicy:0, result:0000000000000000, msgid:0)
DBG: LDAPConnectIter_Next (self:00000241928D82B0, timeout:-1) [tls:0, state:4]
DBG: binding [state:4]
DBG: ldap_thread_bind (params:000002418F12CC70)
DBG: ldapconnectiter_dealloc (self:00000241928D82B0)
DBG: dealloc_conn_info (info:000002418F12CC70)
DBG: ldapconnection_search (self:0000024191DDAB70, args:00000241928A91C0, kwds:0000000000000000)
DBG: LDAPConnection_IsClosed (self:0000024191DDAB70)
DBG: ldapsearchiter_new [self:00000241928D8850]
DBG: LDAPSearchIter_New (conn:0000024191DDAB70)[self:00000241928D8850]
DBG: LDAPConnection_Searching (self:0000024191DDAB70, params_in:000000EBA99EEA30, iterator:00000241928D8850)
DBG: ldapconnection_result (self:0000024191DDAB70, args:00000241928CDC80, kwds:0000000000000000)[msgid:2]
DBG: LDAPConnection_Result (self:0000024191DDAB70, msgid:2, millisec:-1)
DBG: parse_search_result (self:0000024191DDAB70, res:0000024191895A58, obj:00000241928D8850)
DBG: LDAPEntry_FromLDAPMessage (entrymsg:0000024191895A58, conn:0000024191DDAB70)[dn:CN=gpregger,DC=ad,DC=xyz,DC=com]
DBG: ldapentry_new [self:0000024192853EE0]
DBG: ldapentry_init (self:0000024192853EE0)
DBG: LDAPEntry_SetDN (self:0000024192853EE0, value:00000241928878D0)
DBG: ldapconnection_close (self:0000024191DDAB70)
DBG: ldapsearchiter_iternext (self:00000241928D8850)
DBG: ldapsearchiter_iternext (self:00000241928D8850)
[{'cn': ['gpregger'],
  'dn': <LDAPDN CN=gpregger,DC=ad,DC=xyz,DC=com>}]
DBG: ldapentry_dealloc (self:0000024192853EE0)
DBG: ldapentry_clear (self:0000024192853EE0)
DBG: ldapsearchiter_dealloc (self:00000241928D8850)
DBG: ldapconnection_dealloc (self:0000024191DDAB70)
  • For DEBUG_VARIANT = 'explicit_acquire_next_page' (best 'workaround' so far)
DBG: ldapconnection_init (self:000001BEF04AAB70)
DBG: ldapconnection_open (self:000001BEF04AAB70)
DBG: connecting (self:000001BEF04AAB70)
DBG: create_conn_info (mech:SIMPLE, sock:-1, creds:000001BEF0F66D80)
DBG: ldapconnectiter_new [self:000001BEF0FA8260]
DBG: create_init_thread_data (client:000001BEED8FBBC0, sock:-1)
DBG: create_init_thread (ld:000001BEED6C7EC0, info:000001BEED6C3440, thread:0)
DBG: ldapconnection_result (self:000001BEF04AAB70, args:000001BEF0F8DC80, kwds:0000000000000000)[msgid:-1]
DBG: LDAPConnection_Result (self:000001BEF04AAB70, msgid:-1, millisec:-1)
DBG: LDAPConnectIter_Next (self:000001BEF0FA8260, timeout:-1) [tls:0, state:0]
DBG: ldap_init_thread_func (params:000001BEED6C7EC0)
DBG: _ldap_finish_init_thread (async:0, thread:00000000000003C8, timeout:-1, misc:000001BEED6C7EC0)
DBG: ldap_init_thread_func [retval:0]
DBG: binding [state:3]
DBG: _ldap_bind (ld:000001BEEFC04DC8, info:000001BEED6C3440, ppolicy:0, result:0000000000000000, msgid:0)
DBG: LDAPConnectIter_Next (self:000001BEF0FA8260, timeout:-1) [tls:0, state:4]
DBG: binding [state:4]
DBG: ldap_thread_bind (params:000001BEED6C3440)
DBG: ldapconnectiter_dealloc (self:000001BEF0FA8260)
DBG: dealloc_conn_info (info:000001BEED6C3440)
DBG: ldapconnection_search (self:000001BEF04AAB70, args:000001BEF0F69080, kwds:0000000000000000)
DBG: LDAPConnection_IsClosed (self:000001BEF04AAB70)
DBG: ldapsearchiter_new [self:000001BEF0FA8800]
DBG: LDAPSearchIter_New (conn:000001BEF04AAB70)[self:000001BEF0FA8800]
DBG: LDAPConnection_Searching (self:000001BEF04AAB70, params_in:0000002DABFEEBD0, iterator:000001BEF0FA8800)
DBG: ldapconnection_result (self:000001BEF04AAB70, args:000001BEF0F8DC80, kwds:0000000000000000)[msgid:2]
DBG: LDAPConnection_Result (self:000001BEF04AAB70, msgid:2, millisec:-1)
DBG: parse_search_result (self:000001BEF04AAB70, res:000001BEEFDB2788, obj:000001BEF0FA8800)
DBG: LDAPEntry_FromLDAPMessage (entrymsg:000001BEEFDB2788, conn:000001BEF04AAB70)[dn:CN=gpregger,DC=ad,DC=xyz,DC=com]
DBG: ldapentry_new [self:000001BEF0F13EE0]
DBG: ldapentry_init (self:000001BEF0F13EE0)
DBG: LDAPEntry_SetDN (self:000001BEF0F13EE0, value:000001BEF0F478D0)
DBG: ldapsearchiter_acquirenextpage (self:000001BEF0FA8800) cookie:000001BEEDF51118
DBG: ldapconnection_close (self:000001BEF04AAB70)
DBG: ldapsearchiter_iternext (self:000001BEF0FA8800)
DBG: ldapsearchiter_iternext (self:000001BEF0FA8800)
DBG: ldapsearchiter_dealloc (self:000001BEF0FA8800)
[{'cn': ['gpregger'],
  'dn': <LDAPDN CN=gpregger,DC=ad,DC=xyz,DC=com>}]
DBG: ldapentry_dealloc (self:000001BEF0F13EE0)
DBG: ldapentry_clear (self:000001BEF0F13EE0)
DBG: ldapconnection_dealloc (self:000001BEF04AAB70)

gpregger-ethz avatar Jun 11 '25 06:06 gpregger-ethz

Sorry for not responding sooner, but I kept tinkering with this issue in the past weeks. And I'm still not sure what's going on.

One thing is certain, though, that your example has shone light on some shortcomings of the ldapsearchiter implementation:

  • First thing I noticed, that changing the auto_page_acquire to true will cause segfault, because the module tries to initiate new operation after the connection's closed and all the related structs are already freed. Same would happen if you called the acquire_next_page method outside of the connection's with context.
  • Spent quite some time on tracking the reference counter related to the ldapsearchiter object (and thus how it will be freed after no reference remained), because there's something odd is happening there (an extra ref inc in the acquire_next_page method that I can't explain).

The first issue was handled with an extra close connection check before the operation. The second one I couldn't fully solve, but I realised that I need to add cyclic GC support, because the connection and searchiter objects could easily point to each other, making it impossible to properly free them by the GC.

I added a new test that's similar to your example, and while it segfaults on Windows with the current master branch, it runs without problem with the new changes I made. So that's progress, but your experience (no printed entries / WinError 10054) seems to be different than mine (segfaulting on Windows), therefore I'm not entirely sure I solved your issue too.

noirello avatar Jul 14 '25 17:07 noirello

No worries, I appreciate the continued efforts!

FWIW I've integrated bonsai into my own package and haven't experienced any issues since I've found the mentioned workaround.

gpregger-ethz avatar Jul 15 '25 06:07 gpregger-ethz

I've made a new release with changes regarding to this issue. I couldn't really reproduce the same outcome that you described during my development, so it would be much appreciated if you gave the new 1.5.4 version a try with your setup.

noirello avatar Jul 20 '25 13:07 noirello

Thanks for the update! Unfortunately I'm still seeing the same behavior. I've tried to narrow it down further and found that it there seems to be a timing component involved. If I set a breakpoint on the line results = list(custom_paged_search(client)) (line 22 in the test script) and then manually step through the remaining lines with a second pause at every line, the pprint is executed and everything seems fine.
But if I continue automatic script execution it fails as before. I went back to 1.5.3 and it behaves the same.

Maybe you can get something out of a video?

https://github.com/user-attachments/assets/b84f51b0-8605-4af5-8804-f87fe3e129eb

gpregger-ethz avatar Jul 21 '25 09:07 gpregger-ethz

Thank you for checking it. Unfortunately, I still haven't got closer what's really going on here and how to fix this. 🫤

noirello avatar Jul 22 '25 21:07 noirello