Don't document example code which causes memory leaks
The example sp-wsgi/sp.py stores the response.name_id which is a saml2.saml.NameID.
https://github.com/IdentityPython/pysaml2/blob/master/example/sp-wsgi/sp.py#L404-L405
https://github.com/IdentityPython/pysaml2/blob/master/example/sp-wsgi/sp.py#L335-L339
Holding this in memory between multiple requests bloats up the RAM / RS. For 800 logged in users we have a lot stdlib instances, which partly belongs to/references this name_id object:
dict 7625
tuple 7113
list 6421
For a single core CPU running with cherrypy I come into performance problems when 4000 users are logged in. (Maybe because the garbage collector can't remove all objects fast enough?).
The problem got fixed when I simply don't store the NameId object but:
from saml2.ident import code
name_id = code(response.name_id)
Interesting, but We really shouldn't use cherrypy
The reason for the leak is not cherrypy.
There is another leak:
87e51d62847f453cee546889df0c338b1b529495 removed the removal of self.outstanding_queries[self.in_response_to].
So this dictionary is never cleared. Not in the pysaml code, not in the example.
@peppelinux @c00kiemon5ter Okay, I identified another memory problem in our implementation. I need to ask you some questions so that I don't need to read the whole pysaml code. Maybe you can help a little further.
The NameID XML structure is stored in the identity_cache. Which is in my case the in-memory dict. Probably in the BDB cache variant this is serialized.
(Pdb) sp
<saml2.client.Saml2Client object at 0x7f5c163d4150>
(Pdb) sp.users.cache._db.keys()[0]
'1=https%3A//master80.school.dev/univention/saml/metadata,2=urn%3Aoasis%3Anames%3Atc%3ASAML%3A2.0%3Anameid-format%3Atransient,4=_778738a16b7c112cf01c8f3048853a8260a54dc3ba'
(Pdb) sp.users.cache._db.values()[0]
{'https://ucs-sso.school.dev/simplesamlphp/saml2/idp/metadata.php': (1604424934, {'authn_info': [('urn:oasis:names:tc:SAML:2.0:ac:classes:PasswordProtectedTransport', [])], 'name_id': <saml2.saml.NameID object at 0x7f5c1447cdd0>, 'not_on_or_after': 1604424934, 'came_from': 'https://master80.school.dev/univention/saml/', 'ava': {'uid': [u'Administrator']}})}
First I notice that there is an entry for every SAML message ever received because our NameFormat is transient. Maybe I should change this to persistent?!
Then I will write a patch which doesn't store the NameID object but instead the serialized (saml2.ident.code()) version of it.
For what are these information (identity cache) needed? Probably for the logout functionality? Anything else?
Are these information still relevant if the SessionNotOnOrAfter time is over? I don't see a mechanism that removes cache entries after that time. Are they relevant if the `NotOnOrAfter?? time is over?
When I destroy my session, I probably should call some function to remove entries from the cache. ~~I still have to search which function.~~ sp.users.remove_person(name_id) works.
When I use the BDB cache, does bdb implement a locking? (I am doing multiprocessing).
Then I will write a patch which doesn't store the
NameIDobject but instead the serialized (saml2.ident.code()) version of it. Oh, this is already done in fbb0b5a1e76ff2e638dafcbd8cf6e730d1d68e0a.
When I destroy my session, I probably should call some function to remove entries from the cache. ~I still have to search which function.~
sp.users.remove_person(name_id)works.sp.local_logout(name_id)is even better ;-)
https://github.com/knaperek/djangosaml2/blob/a382e9e81d7f4a1a440a7b0130250f5c2b54c4b9/djangosaml2/views.py#L84
https://github.com/knaperek/djangosaml2/blob/a382e9e81d7f4a1a440a7b0130250f5c2b54c4b9/djangosaml2/views.py#L452
state.sync() would be part of a wider ation
I can drive you in the djangosaml2 implementation and also uniAuth idp. I used to have many uwsgi workers and I never encourred in troubles of memleaks or concurrent issues
Regarding concurrency problems after switching to BDB:
Traceback (most recent call last):
response = self.sp.parse_authn_request_response(message, binding, self.outstanding_queries)
File "/usr/lib/python2.7/dist-packages/saml2/client_base.py", line 594, in parse_authn_request_response
self.users.add_information_about_person(resp.session_info())
File "/usr/lib/python2.7/dist-packages/saml2/population.py", line 26, in add_information_about_person
session_info["not_on_or_after"])
File "/usr/lib/python2.7/dist-packages/saml2/cache.py", line 120, in set
self._db.sync()
File "/usr/lib/python2.7/shelve.py", line 168, in sync
for key, entry in self.cache.iteritems():
RuntimeError: dictionary changed size during iteration
Regarding concurrency problems after switching to BDB:
Traceback (most recent call last): response = self.sp.parse_authn_request_response(message, binding, self.outstanding_queries) File "/usr/lib/python2.7/dist-packages/saml2/client_base.py", line 594, in parse_authn_request_response self.users.add_information_about_person(resp.session_info()) File "/usr/lib/python2.7/dist-packages/saml2/population.py", line 26, in add_information_about_person session_info["not_on_or_after"]) File "/usr/lib/python2.7/dist-packages/saml2/cache.py", line 120, in set self._db.sync() File "/usr/lib/python2.7/shelve.py", line 168, in sync for key, entry in self.cache.iteritems(): RuntimeError: dictionary changed size during iteration
Shared memories, shared objects ... dictionary size that changed during another I/O on loops, in another worker ... That's the evil, put a shared storage for this kind of persistence and you will live a hundred years
We also applied the following patch:
From 440de09561f483b83675cb76f377b81827cd1fbf Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?J=C3=BCrn=20Brodersen?= <[email protected]>
Date: Thu, 12 Nov 2020 16:01:34 +0100
Subject: [PATCH] Bug #99999: Disable writeback for shelve to save memory
The writeback cache can grow quite large and which consumes memory. With
a small change in the `set` function, the writeback cache is not
needed.
---
src/saml2/cache.py | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/src/saml2/cache.py b/src/saml2/cache.py
index faccfd9..abf6a70 100644
--- a/src/saml2/cache.py
+++ b/src/saml2/cache.py
@@ -23,7 +23,7 @@ class CacheError(SAMLError):
class Cache(object):
def __init__(self, filename=None):
if filename:
- self._db = shelve.open(filename, writeback=True, protocol=2)
+ self._db = shelve.open(filename, writeback=False, protocol=2)
self._sync = True
else:
self._db = {}
@@ -116,10 +116,14 @@ class Cache(object):
info['name_id'] = code(name_id)
cni = code(name_id)
- if cni not in self._db:
- self._db[cni] = {}
+ data = self._db.get(cni, {})
+ data[entity_id] = (not_on_or_after, info)
+
+ # If self._db is a shelf, operations on data are not written back
+ # with "writeback=False". The data object in the shelf needs to be
+ # explicitly overwritten.
+ self._db[cni] = data
- self._db[cni][entity_id] = (not_on_or_after, info)
if self._sync:
try:
self._db.sync()
That's something for a PR, please do that to see if it would be good in CI, then @c00kiemon5ter will see how and when merge it. Looks reasonable to me, great job
... afaik a cache would be Better to be implemented in a real db, RDBMS or NoSQL. I'd really discourage a shelve/pickle storage
So, is this a memory leak problem or not?