python-irodsclient icon indicating copy to clipboard operation
python-irodsclient copied to clipboard

Exception ignored message when creating failed + valid session

Open qubixes opened this issue 9 months ago • 29 comments

Hi!

During work on one of my PRs we ran into some exception ignored messages such as:

Exception ignored in: <function Connection.__del__ at 0x7af078ff67a0>
Traceback (most recent call last):
  File "/home/qubix/.pyenv/versions/3.11.6/lib/python3.11/site-packages/irods/connection.py", line 132, in __del__
    self.disconnect()
  File "/home/qubix/.pyenv/versions/3.11.6/lib/python3.11/site-packages/irods/connection.py", line 361, in disconnect
    self.send(disconnect_msg)
  File "/home/qubix/.pyenv/versions/3.11.6/lib/python3.11/site-packages/irods/connection.py", line 148, in send
    raise NetworkException("Unable to send message")
irods.exception.NetworkException: Unable to send message

Even though the functionality in iBridges still works fine, this is of course somewhat confusing to the user/us. We first thought it was caused by the new PR, but it seems to originate from the PRC, since installing an old version of PRC removes the message. For reproducing I have found this works relatively consistently:

import sys
from irods.session import iRODSSession

if __name__  == "__main__":
    env_file = sys.argv[1]
    try:
        session = iRODSSession(password="bad_pw", irods_env_file=env_file)
        print(session.server_version)
    except Exception:
        pass
    session = iRODSSession(password="good_pw", irods_env_file=env_file)
    print(session.server_version)

If you can't reproduce it the first time, it might reproduce the second time in my experience.

Looking through the PRC release notes, if found that https://github.com/irods/python-irodsclient/commit/f2332b3ec9bba48913d9a2500391e4f1399248e9 might be related to this, since I think the issue appears in the cleanup phase.

Let me know if you need more information (I get 4.2.12 for the server version)).

qubixes avatar Mar 13 '25 20:03 qubixes

@qubixes What's the scheme used by your client env file ? With native authentication, I wasn't able to reproduce the issue.

d-w-moore avatar Mar 14 '25 14:03 d-w-moore

Also (again to @qubixes ) - assuming I'm able to reproduce at some point - which older version of PRC corrects the problem?

d-w-moore avatar Mar 14 '25 14:03 d-w-moore

Will try with SSL enabled. Sometimes that makes a difference.

@qubixes Since the error is incurred while trying to send a disconnect message to the server, on a socket the iRODS server has already given up on, I am pretty sure functionality won't be threatened by the issue underlying this error message.

Perhaps it would be an option to turn this into an INFO level of logged message, instead of an actual exception. Will discuss with the rest of the iRODS Team.

d-w-moore avatar Mar 14 '25 15:03 d-w-moore

@d-w-moore I'll just paste my irods_environment.json for completeness:

{
    "irods_host": "geo.data.uu.nl",
    "irods_port": 1247,
    "irods_home": "/nluu11p/home",
    "irods_user_name": "<snip>",
    "irods_default_resource": "irodsResc",
    "irods_zone_name": "nluu11p",
    "irods_authentication_scheme": "pam",
    "irods_encryption_algorithm": "AES-256-CBC",
    "irods_encryption_key_size": 32,
    "irods_encryption_num_hash_rounds": 16,
    "irods_encryption_salt_size": 8,
    "irods_client_server_policy": "CS_NEG_REQUIRE",
    "irods_client_server_negotiation": "request_server_negotiation"
}

qubixes avatar Mar 14 '25 15:03 qubixes

Will try with SSL enabled. Sometimes that makes a difference.

@qubixes Since the error is incurred while trying to send a disconnect message to the server, on a socket the iRODS server has already given up on, I am pretty sure functionality won't be threatened by the issue underlying this error message.

Perhaps it would be an option to turn this into an INFO level of logged message, instead of an actual exception. Will discuss with the rest of the iRODS Team.

Yes, functionally there are no problems, it's just the messaging to the user.

qubixes avatar Mar 14 '25 15:03 qubixes

Also (again to @qubixes ) - assuming I'm able to reproduce at some point - which older version of PRC corrects the problem?

I did a little bit of a manual bisect, and I get the problem with PRC==2.2.0, but not with PRC==2.1.0. Let me know if you need me to bisect on commits.

qubixes avatar Mar 14 '25 15:03 qubixes

not too many to eyeball there...

https://github.com/irods/python-irodsclient/compare/v2.1.0...v2.2.0

trel avatar Mar 14 '25 16:03 trel

not too many to eyeball there...

v2.1.0...v2.2.0

No, but I see 2 or 3 that are potentially influencing changes.

d-w-moore avatar Mar 14 '25 17:03 d-w-moore

Since unhandled exceptions aren't good in finalizers, this loose end should be tied up neatly and logged. But the logging shouldn't be seen at the default level of WARNING, so let's do an INFO level message that simply displays the object representation of the Exception.

d-w-moore avatar Mar 17 '25 16:03 d-w-moore

It's still elusive to try and reproduce this error, so although I have what I think is a fix, I can't really test it.

d-w-moore avatar Mar 17 '25 17:03 d-w-moore

For now, keep your solution to this in its own branch until we can reproduce it. That keeps other PRs from being blocked by this.

korydraughn avatar Mar 17 '25 18:03 korydraughn

It's still elusive to try and reproduce this error, so although I have what I think is a fix, I can't really test it.

I could test it if you want!

qubixes avatar Mar 18 '25 07:03 qubixes

It's still elusive to try and reproduce this error, so although I have what I think is a fix, I can't really test it.

I could test it if you want!

Yes, it's certainly a help to find conditions that reproduce it; could help us to get a fix in sooner.

d-w-moore avatar Mar 18 '25 13:03 d-w-moore

@d-w-moore - Please share the branch and/or diff with the experimental fix and perhaps @qubixes can apply it and try it out?

alanking avatar Mar 18 '25 13:03 alanking

@d-w-moore - Please share the branch and/or diff with the experimental fix and perhaps @qubixes can apply it and try it out?

Good idea, will do.

d-w-moore avatar Mar 19 '25 19:03 d-w-moore

@qubixes Please see dev branch https://github.com/d-w-moore/python-irodsclient/tree/691.m , the very latest commit 3239d54bc9191c36ad6d3ba189ef6ba2f8b85d73, as a basis for how to test for whether my existing patch helps in this issue

d-w-moore avatar Mar 19 '25 19:03 d-w-moore

@qubixes Please see dev branch https://github.com/d-w-moore/python-irodsclient/tree/691.m , the very latest commit 3239d54, as a basis for how to test for whether my existing patch helps in this issue

Yes, it does help the issue, I don't get a message about the connection anymore with the default logging level.

I can still reproduce the error message for the latest version on PyPI (3.0.0).

qubixes avatar Mar 20 '25 10:03 qubixes

Good.

@d-w-moore Does @qubixes' recent success with the change help us understand the problem?

korydraughn avatar Mar 20 '25 12:03 korydraughn

Good.

@d-w-moore Does @qubixes' recent success with the change help us understand the problem?

Not on any grand level. But it does suppress a message that was potentially not useful for the application user (ie noise control) -- provided the default WARNING log level isn't decreased for the log message in question. It also means we've put a bandage on what was a potential troublemaker, since exceptions should never allowed to be unhandled within the scope of the __del__ method of an object.

d-w-moore avatar Mar 21 '25 02:03 d-w-moore

@qubixes - I tried again with iRODS 4.2.12, PRC v2.2.0, with PAM & SSL login, using your script above almost exactly, and am still not able to reproduce, which means I can't really write a test.

Can you think of any other special conditions (or modifications relative to a "fresh" install) that apply to your system? Or perhaps a set of steps you go through that is guaranteed to produce the error?

d-w-moore avatar Mar 25 '25 07:03 d-w-moore

@d-w-moore I have asked @stsnel, since he knows a lot more about the system than I do. For now we have the server logs from the server side from when I run the script:


Mar 25 09:17:19 pid:6687 NOTICE: writeLine: inString = {<email>#nluu12p} Agent process started from <ip1>
Mar 25 09:17:19 pid:6637 remote addresses: <ip2>, <ip1> ERROR: [-]     /irods_source/server/core/src/rsApiHandler.cpp:542:int readAndProcClientM
sg(rsComm_t *, int) :  status [SYS_HEADER_READ_LEN_ERR]  errno [] -- message [failed to call 'read header']
        [-]     /irods_source/lib/core/src/sockComm.cpp:198:irods::error readMsgHeader(irods::network_object_ptr, msgHeader_t *, struct timeval *) :  status [SYS
_HEADER_READ_LEN_ERR]  errno [] -- message [failed to call 'read header']
                [-]     /irods_source/plugins/network/ssl/libssl.cpp:574:irods::error ssl_read_msg_header(irods::plugin_context &, void *, struct timeval *) :  s
tatus [SYS_HEADER_READ_LEN_ERR]  errno [] -- message [read 0 expected 4]

Mar 25 09:17:19 pid:6637 remote addresses: <ip2>, <ip1> ERROR: [-]     /irods_source/server/core/src/rodsAgent.cpp:577:int runIrodsAgentFactory(
sockaddr_un) :  status [SSL_SHUTDOWN_ERROR]  errno [] -- message [failed to call 'agent stop']
        [-]     /irods_source/lib/core/src/sockComm.cpp:160:irods::error sockAgentStop(irods::network_object_ptr) :  status [SSL_SHUTDOWN_ERROR]  errno [] -- message [failed to call 'agent stop']
                [-]     /irods_source/plugins/network/ssl/libssl.cpp:954:irods::error ssl_agent_stop(irods::plugin_context &) :  status [SSL_SHUTDOWN_ERROR]  errno [] -- message [error completing shutdown of SSL connection]

qubixes avatar Mar 25 '25 08:03 qubixes

Preliminary results for UU Yoda 1.9.x environment (based on iRODS 4.2.12) with Python 3.12.3 client:

  • Python-irodsclient 2.0.0: issue does not occur ✅
  • Python-irodsclient 2.1.0: issue does not occur ✅
  • Python-irodsclient 2.2.0: issue occurs reliably 🔴
  • Python-irodsclient 3.0.0: issue does not occur ✅
  • Python-irodsclient 3.1.0: issue occurs reliably 🔴

If I run a bisect between 3.0.0 and 3.1.0, I get:

bee82948ba15227b7cca990b133f8d2b829533cd is the first bad commit

Link to this commit: https://github.com/irods/python-irodsclient/commit/bee82948ba15227b7cca990b133f8d2b829533cd

I'll check with @qubixes to see if we can compare notes and gather more information.

stsnel avatar Mar 25 '25 16:03 stsnel

Checking into this again. It could be sensitive to the slightest things, and I'm beginning to think the Python version makes the difference (It wouldn't be the first time, esp where interpreter exit / GC issues are concerned). I had not been testing with anything as recent as Python 3.11 or 3.12....

@stsnel , I was wondering what you use in your bisect to determine success/failure. Do you have a ready-made script or were you evaluating the condition manually?

d-w-moore avatar Mar 26 '25 13:03 d-w-moore

@stsnel , I was wondering what you use in your bisect to determine success/failure. Do you have a ready-made script or were you evaluating the condition manually?

I was using the script provided by Qubixes at https://github.com/irods/python-irodsclient/issues/691#issue-2918364743 . I invoked it three times per PRC version to determine if the failure occurred (sort of) consistently.

stsnel avatar Mar 26 '25 13:03 stsnel

Checking into this again. It could be sensitive to the slightest things, and I'm beginning to think the Python version makes the difference (It wouldn't be the first time, esp where interpreter exit / GC issues are concerned). I had not been testing with anything as recent as Python 3.11 or 3.12....

Right, that makes sense, I also see that I get different results with Python 3.9.20 on my system:

  • Python-irodsclient 2.0.0: issue does not occur ✅
  • Python-irodsclient 2.1.0: issue does not occur ✅
  • Python-irodsclient 2.2.0: issue occurs reliably 🔴
  • Python-irodsclient 3.0.0: issue occurs reliably 🔴
  • Python-irodsclient 3.1.0: issue occurs reliably 🔴

stsnel avatar Mar 26 '25 13:03 stsnel

Unfortunately I'm not getting consistent results with yours , @stsnel and @qubixes . Still unable to reproduce the error dump:

(py312) testuser@ac99544526dd:/tmp$ ./pamss691.py  ~/.irods//irods_environment.json 
session.server_version = (4, 2, 12)
session.pool.account.authentication_scheme = 'pam'
sys.version = '3.12.3 (main, Mar 27 2025, 12:30:19) [GCC 7.5.0]'
type(list(session.pool.idle|session.pool.active)[0].socket) = <class 'ssl.SSLSocket'>
(py312) testuser@ac99544526dd:/tmp$ ./pamss691.py  ~/.irods//irods_environment.json 
session.server_version = (4, 2, 12)
session.pool.account.authentication_scheme = 'pam'
sys.version = '3.12.3 (main, Mar 27 2025, 12:30:19) [GCC 7.5.0]'
type(list(session.pool.idle|session.pool.active)[0].socket) = <class 'ssl.SSLSocket'>
(py312) testuser@ac99544526dd:/tmp$ ./pamss691.py  ~/.irods//irods_environment.json 
session.server_version = (4, 2, 12)
session.pool.account.authentication_scheme = 'pam'
sys.version = '3.12.3 (main, Mar 27 2025, 12:30:19) [GCC 7.5.0]'
type(list(session.pool.idle|session.pool.active)[0].socket) = <class 'ssl.SSLSocket'>
(py312) testuser@ac99544526dd:/tmp$ ./pamss691.py  ~/.irods//irods_environment.json 
session.server_version = (4, 2, 12)
session.pool.account.authentication_scheme = 'pam'
sys.version = '3.12.3 (main, Mar 27 2025, 12:30:19) [GCC 7.5.0]'
type(list(session.pool.idle|session.pool.active)[0].socket) = <class 'ssl.SSLSocket'>
(py312) testuser@ac99544526dd:/tmp$ 

Using this script and alice as the PAM login with password 'apass'

(py312) testuser@ac99544526dd:/tmp$ more pamss691.py 
#!/usr/bin/env python3
import sys
from irods.session import iRODSSession

if __name__  == "__main__":
    env_file = sys.argv[1]
    try:
        session = iRODSSession(password="apass1", irods_env_file=env_file)
        print(session.server_version)
    except Exception:
        pass
    session = iRODSSession(password="apass", irods_env_file=env_file)
    print(f'{session.server_version = }')
    print(f'{session.pool.account.authentication_scheme = }')
    print(f'{sys.version = }')
    print(f'{type(list(session.pool.idle|session.pool.active)[0].socket) = }')

and in the same terminal session

(py312) testuser@ac99544526dd:/tmp$ python -c 'import irods;print(irods.__version__)'
2.2.0
(py312) testuser@ac99544526dd:/tmp$ 

d-w-moore avatar Mar 27 '25 12:03 d-w-moore

@stsnel - Any changes to the irods.client_configuration settings and/or environment variables that are in force (related to those settings or perhaps to Yoda) may be influencing the results. Also, which OS are you running on?

d-w-moore avatar Mar 27 '25 13:03 d-w-moore

Hi @d-w-moore,

I've performed these tests on a laptop running Ubuntu 24.04 LTS.

I don't see any relevant environment variables that could influence this behaviour. Please find the settings in the irods_environment file on the client below:

$ grep -vE "(irods_home|irods_user_name|irods_zone_name|irods_host)" ~/.irods/irods_environment.json
{
    "irods_authentication_scheme": "pam",
    "irods_client_server_negotiation": "request_server_negotiation",
    "irods_client_server_policy": "CS_NEG_REQUIRE",
    "irods_port": 1247,
    "irods_default_resource": "irodsResc",
    "irods_encryption_algorithm": "AES-256-CBC",
    "irods_encryption_key_size": 32,
    "irods_encryption_num_hash_rounds": 16,
    "irods_encryption_salt_size": 8,
}

Settings on the provider that we're connecting to:

$ grep -E "(irods_client_server|irods_encryption)" .irods/irods_environment.json
    "irods_client_server_negotiation": "request_server_negotiation", 
    "irods_client_server_policy": "CS_NEG_REQUIRE", 
    "irods_encryption_algorithm": "AES-256-CBC", 
    "irods_encryption_key_size": 32, 
    "irods_encryption_num_hash_rounds": 16, 
    "irods_encryption_salt_size": 8, 

I can reproduce the issue with your script in combination with Python 3.12.3 and PRC 3.1.0:

$ ./test-daniel.py ~/.irods/irods_environment.json
session.server_version = (4, 2, 12)
session.pool.account.authentication_scheme = 'pam'
sys.version = '3.12.3 (main, Jan 17 2025, 18:03:48) [GCC 13.3.0]'
type(list(session.pool.idle|session.pool.active)[0].socket) = <class 'ssl.SSLSocket'>
Exception ignored in: <function Connection.__del__ at 0x7e6ae95fc900>
Traceback (most recent call last):
  File "/some-directory/venv/lib/python3.12/site-packages/irods/connection.py", line 147, in __del__
    self.disconnect()
  File "/some-directory/venv/lib/python3.12/site-packages/irods/connection.py", line 376, in disconnect
    self.send(disconnect_msg)
  File "/some-directory/venv/lib/python3.12/site-packages/irods/connection.py", line 163, in send
    raise NetworkException("Unable to send message")
irods.exception.NetworkException: Unable to send message

If I print a traceback for the exception that is raised when creating the first session, it seems to complain about a missing import:

Traceback (most recent call last):
  File "/some-directory/venv/lib/python3.12/site-packages/irods/session.py", line 416, in __server_version
    conn = next(iter(self.pool.active))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/some-directory/venv/lib/python3.12/site-packages/irods/pool.py", line 93, in get_connection
    conn = self.idle.pop()
           ^^^^^^^^^^^^^^^
KeyError: 'pop from an empty set'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/some-directory/venv/lib/python3.12/site-packages/irods/connection.py", line 600, in _login_pam
    output_message = self.recv()
                     ^^^^^^^^^^^
  File "/some-directory/venv/lib/python3.12/site-packages/irods/connection.py", line 196, in recv
    raise exc
irods.exception.PAM_AUTH_PASSWORD_FAILED: None

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/some-directory/./test-daniel.py", line 11, in <module>
    print(session.server_version)
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/some-directory/venv/lib/python3.12/site-packages/irods/session.py", line 379, in server_version
    return self._server_version()
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/some-directory/venv/lib/python3.12/site-packages/irods/session.py", line 412, in _server_version
    return self.__server_version() if version_func is None else version_func(self)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/some-directory/venv/lib/python3.12/site-packages/irods/session.py", line 419, in __server_version
    conn = self.pool.get_connection()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/some-directory/venv/lib/python3.12/site-packages/irods/pool.py", line 18, in method_
    ret = method(self, *s, **kw)
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/some-directory/venv/lib/python3.12/site-packages/irods/pool.py", line 117, in get_connection
    conn = Connection(self, self.account)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/some-directory/venv/lib/python3.12/site-packages/irods/connection.py", line 123, in __init__
    self._login_pam()
  File "/some-directory/venv/lib/python3.12/site-packages/irods/connection.py", line 601, in _login_pam
    except irods.exception.PAM_AUTH_PASSWORD_INVALID_TTL as exc:
           ^^^^^
NameError: name 'irods' is not defined

So it looks like connection.py needs an additional import here. However, adding the import does not resolve the issue, so this does not look related. Have submitted a separate PR for this: https://github.com/irods/python-irodsclient/pull/705

stsnel avatar Mar 28 '25 10:03 stsnel

I can reproduce the issue with your script in combination with Python 3.12.3 and PRC 3.1.0:

$ ./test-daniel.py ~/.irods/irods_environment.json
session.server_version = (4, 2, 12)
session.pool.account.authentication_scheme = 'pam'
sys.version = '3.12.3 (main, Jan 17 2025, 18:03:48) [GCC 13.3.0]'
type(list(session.pool.idle|session.pool.active)[0].socket) = <class 'ssl.SSLSocket'>
Exception ignored in: <function Connection.__del__ at 0x7e6ae95fc900>
Traceback (most recent call last):
  File "/some-directory/venv/lib/python3.12/site-packages/irods/connection.py", line 147, in __del__
    self.disconnect()
  File "/some-directory/venv/lib/python3.12/site-packages/irods/connection.py", line 376, in disconnect
    self.send(disconnect_msg)
  File "/some-directory/venv/lib/python3.12/site-packages/irods/connection.py", line 163, in send
    raise NetworkException("Unable to send message")
irods.exception.NetworkException: Unable to send message

hi @stsnel , Indeed - I was able to see the error message myself with these exact versions, except using Ubuntu 18.04. Presumably the OS doesn't matter as much. Thanks again!

d-w-moore avatar Mar 28 '25 13:03 d-w-moore

Since the issue at hand seems solved in https://github.com/d-w-moore/python-irodsclient/tree/691.m, and @stsnel has merged a fix for #705 we'll consider for the time being no more work needs to be done here.

d-w-moore avatar Jun 03 '25 03:06 d-w-moore