pysaml2 Don't document example code which causes memory leaks

The example sp-wsgi/sp.py stores the response.name_id which is a saml2.saml.NameID. https://github.com/IdentityPython/pysaml2/blob/master/example/sp-wsgi/sp.py#L404-L405 https://github.com/IdentityPython/pysaml2/blob/master/example/sp-wsgi/sp.py#L335-L339

Holding this in memory between multiple requests bloats up the RAM / RS. For 800 logged in users we have a lot stdlib instances, which partly belongs to/references this name_id object:

dict 7625
tuple 7113
list 6421

For a single core CPU running with cherrypy I come into performance problems when 4000 users are logged in. (Maybe because the garbage collector can't remove all objects fast enough?).

The problem got fixed when I simply don't store the NameId object but:

from saml2.ident import code
name_id = code(response.name_id)

Oct 29 '20 12:10 spaceone

Interesting, but We really shouldn't use cherrypy

Oct 29 '20 12:10 peppelinux

The reason for the leak is not cherrypy.

Nov 09 '20 15:11 spaceone

There is another leak: 87e51d62847f453cee546889df0c338b1b529495 removed the removal of self.outstanding_queries[self.in_response_to]. So this dictionary is never cleared. Not in the pysaml code, not in the example.

Nov 09 '20 15:11 spaceone

@peppelinux @c00kiemon5ter Okay, I identified another memory problem in our implementation. I need to ask you some questions so that I don't need to read the whole pysaml code. Maybe you can help a little further.

The NameID XML structure is stored in the identity_cache. Which is in my case the in-memory dict. Probably in the BDB cache variant this is serialized.

(Pdb) sp
<saml2.client.Saml2Client object at 0x7f5c163d4150>
(Pdb) sp.users.cache._db.keys()[0]  
'1=https%3A//master80.school.dev/univention/saml/metadata,2=urn%3Aoasis%3Anames%3Atc%3ASAML%3A2.0%3Anameid-format%3Atransient,4=_778738a16b7c112cf01c8f3048853a8260a54dc3ba'
(Pdb) sp.users.cache._db.values()[0]
{'https://ucs-sso.school.dev/simplesamlphp/saml2/idp/metadata.php': (1604424934, {'authn_info': [('urn:oasis:names:tc:SAML:2.0:ac:classes:PasswordProtectedTransport', [])], 'name_id': <saml2.saml.NameID object at 0x7f5c1447cdd0>, 'not_on_or_after': 1604424934, 'came_from': 'https://master80.school.dev/univention/saml/', 'ava': {'uid': [u'Administrator']}})}

First I notice that there is an entry for every SAML message ever received because our NameFormat is transient. Maybe I should change this to persistent?!

Then I will write a patch which doesn't store the NameID object but instead the serialized (saml2.ident.code()) version of it.

For what are these information (identity cache) needed? Probably for the logout functionality? Anything else?

Are these information still relevant if the SessionNotOnOrAfter time is over? I don't see a mechanism that removes cache entries after that time. Are they relevant if the `NotOnOrAfter?? time is over?

When I destroy my session, I probably should call some function to remove entries from the cache. ~~I still have to search which function.~~ sp.users.remove_person(name_id) works.

When I use the BDB cache, does bdb implement a locking? (I am doing multiprocessing).

Nov 10 '20 15:11 spaceone

Then I will write a patch which doesn't store the NameID object but instead the serialized (saml2.ident.code()) version of it. Oh, this is already done in fbb0b5a1e76ff2e638dafcbd8cf6e730d1d68e0a.

Nov 10 '20 15:11 spaceone

When I destroy my session, I probably should call some function to remove entries from the cache. ~I still have to search which function.~ sp.users.remove_person(name_id) works. sp.local_logout(name_id) is even better ;-)

Nov 10 '20 17:11 spaceone

https://github.com/knaperek/djangosaml2/blob/a382e9e81d7f4a1a440a7b0130250f5c2b54c4b9/djangosaml2/views.py#L84

https://github.com/knaperek/djangosaml2/blob/a382e9e81d7f4a1a440a7b0130250f5c2b54c4b9/djangosaml2/views.py#L452

state.sync() would be part of a wider ation

I can drive you in the djangosaml2 implementation and also uniAuth idp. I used to have many uwsgi workers and I never encourred in troubles of memleaks or concurrent issues

Nov 10 '20 17:11 peppelinux

Regarding concurrency problems after switching to BDB:

Traceback (most recent call last):
    response = self.sp.parse_authn_request_response(message, binding, self.outstanding_queries)
  File "/usr/lib/python2.7/dist-packages/saml2/client_base.py", line 594, in parse_authn_request_response
    self.users.add_information_about_person(resp.session_info())
  File "/usr/lib/python2.7/dist-packages/saml2/population.py", line 26, in add_information_about_person
    session_info["not_on_or_after"])
  File "/usr/lib/python2.7/dist-packages/saml2/cache.py", line 120, in set
    self._db.sync()
  File "/usr/lib/python2.7/shelve.py", line 168, in sync
    for key, entry in self.cache.iteritems():
RuntimeError: dictionary changed size during iteration

Nov 11 '20 11:11 spaceone

Regarding concurrency problems after switching to BDB:

Traceback (most recent call last):
    response = self.sp.parse_authn_request_response(message, binding, self.outstanding_queries)
  File "/usr/lib/python2.7/dist-packages/saml2/client_base.py", line 594, in parse_authn_request_response
    self.users.add_information_about_person(resp.session_info())
  File "/usr/lib/python2.7/dist-packages/saml2/population.py", line 26, in add_information_about_person
    session_info["not_on_or_after"])
  File "/usr/lib/python2.7/dist-packages/saml2/cache.py", line 120, in set
    self._db.sync()
  File "/usr/lib/python2.7/shelve.py", line 168, in sync
    for key, entry in self.cache.iteritems():
RuntimeError: dictionary changed size during iteration

Shared memories, shared objects ... dictionary size that changed during another I/O on loops, in another worker ... That's the evil, put a shared storage for this kind of persistence and you will live a hundred years

Nov 14 '20 12:11 peppelinux

We also applied the following patch:

From 440de09561f483b83675cb76f377b81827cd1fbf Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?J=C3=BCrn=20Brodersen?= <[email protected]>
Date: Thu, 12 Nov 2020 16:01:34 +0100
Subject: [PATCH] Bug #99999: Disable writeback for shelve to save memory

The writeback cache can grow quite large and which consumes memory. With
a small  change in the `set` function, the writeback cache is not 
needed.
---
 src/saml2/cache.py | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/saml2/cache.py b/src/saml2/cache.py
index faccfd9..abf6a70 100644
--- a/src/saml2/cache.py
+++ b/src/saml2/cache.py
@@ -23,7 +23,7 @@ class CacheError(SAMLError):
 class Cache(object):
     def __init__(self, filename=None):
         if filename:
-            self._db = shelve.open(filename, writeback=True, protocol=2)
+            self._db = shelve.open(filename, writeback=False, protocol=2)
             self._sync = True
         else:
             self._db = {}
@@ -116,10 +116,14 @@ class Cache(object):
             info['name_id'] = code(name_id)
 
         cni = code(name_id)
-        if cni not in self._db:
-            self._db[cni] = {}
+        data = self._db.get(cni, {})
+        data[entity_id] = (not_on_or_after, info)
+
+        # If self._db is a shelf, operations on data are not written back
+        # with "writeback=False". The data object in the shelf needs to be
+        # explicitly overwritten.
+        self._db[cni] = data
 
-        self._db[cni][entity_id] = (not_on_or_after, info)
         if self._sync:
             try:
                 self._db.sync()

Nov 30 '20 21:11 spaceone

That's something for a PR, please do that to see if it would be good in CI, then @c00kiemon5ter will see how and when merge it. Looks reasonable to me, great job

Nov 30 '20 22:11 peppelinux

... afaik a cache would be Better to be implemented in a real db, RDBMS or NoSQL. I'd really discourage a shelve/pickle storage

Nov 30 '20 22:11 peppelinux

So, is this a memory leak problem or not?

Apr 25 '22 04:04 GuardianRain