couchdb-python icon indicating copy to clipboard operation
couchdb-python copied to clipboard

Possible Memory leak

Open LiKao opened this issue 9 years ago • 3 comments

It seems that couchdb-python or some of the libraries used leaks memory. This can lead to memory exhaustion on big tasks. A simple example using only reads from a server with about 20k documents:

import psutil
import os
import gc

from couchdb import Server

def show_memory():
    process = psutil.Process(os.getpid())
    meminfo = process.memory_info()
    print('Memory usage:')
    print("\tResident: %d (kb)" %(meminfo[0]/1024))
    print("\tVirtual:  %d (kb)" %(meminfo[1]/1024))


server = Server(url)

print("Before")
show_memory()

for x in db:
   pass

print("After")
show_memory()

print("After collect")
gc.collect()
show_memory()

print("DB deleted")
del db
show_memory()

print("Server deleted")
del server
show_memory()

Output:

Before
Memory usage:
    Resident: 16444 (kb)
    Virtual:  98680 (kb)
After
Memory usage:
    Resident: 18444 (kb)
    Virtual:  102928 (kb)
After collect
Memory usage:
    Resident: 17932 (kb)
    Virtual:  102416 (kb)
DB deleted
Memory usage:
    Resident: 17932 (kb)
    Virtual:  102416 (kb)
Server deleted
Memory usage:
    Resident: 17932 (kb)
    Virtual:  102416 (kb)

I.e. the memory is retained even after all resources are removed. During batch import of large datasets this can lead to memory exhaustion on some systems:

Testcase: Import of 50k randomly generated documents, 50 fields per document, 50bytes per field. Ids provided by using the uuid4() method in python. Import in batches of 20k Documents using the db.update(docs) method. Batches are generated one at a time and then deleted.

Output:

Generating batch nr. 0
Memory usage:
    Resident: 19096 (kb)
    Virtual:  225364 (kb)
Before Upload
Memory usage:
    Resident: 221952 (kb)
    Virtual:  428276 (kb)
Uploading Batch nr 0
Upload done, docs deleted, gc.collect()
Memory usage:
    Resident: 87528 (kb)
    Virtual:  294720 (kb)
Generating batch nr. 1
Memory usage:
    Resident: 87532 (kb)
    Virtual:  294724 (kb)
Before Upload
Memory usage:
    Resident: 226804 (kb)
    Virtual:  433988 (kb)
Uploading Batch nr 1
Traceback (most recent call last):
  File "./benchmark.py", line 198, in <module>
    do_benchmark()
  File "./benchmark.py", line 153, in do_benchmark
    db.update(docs)
  File "/usr/local/lib/python2.7/site-packages/couchdb/client.py", line 785, in update
    _, _, data = self.resource.post_json('_bulk_docs', body=content)
  File "/usr/local/lib/python2.7/site-packages/couchdb/http.py", line 545, in post_json
    **params)
  File "/usr/local/lib/python2.7/site-packages/couchdb/http.py", line 564, in _request_json
    headers=headers, **params)
  File "/usr/local/lib/python2.7/site-packages/couchdb/http.py", line 560, in _request
    credentials=self.credentials)
  File "/usr/local/lib/python2.7/site-packages/couchdb/http.py", line 261, in request
    body = json.encode(body).encode('utf-8')
  File "/usr/local/lib/python2.7/site-packages/couchdb/json.py", line 69, in encode
    return _encode(obj)
  File "/usr/local/lib/python2.7/site-packages/couchdb/json.py", line 117, in <lambda>
    dumps(obj, allow_nan=False, ensure_ascii=False)
  File "/usr/lib64/python2.7/dist-packages/simplejson/__init__.py", line 386, in dumps
    **kw).encode(obj)
  File "/usr/lib64/python2.7/dist-packages/simplejson/encoder.py", line 275, in encode
    return u''.join(chunks)
MemoryError

I.e. first update works, second update fails with memory error. This indicates that the problem is not caused by the batch size alone, as then either both uploads would have to fail or both had to work fine. Instead some resource does not seem to get freed correctly between uploads.

LiKao avatar Jul 13 '16 10:07 LiKao

Thanks for the report! I don't really have a lot of experience with debugging this kind of problem, and I haven't had a lot of time to work on couchdb-python recently. Thus, I'm not sure if I can devote the proper amount of attention to this issue. If anyone else want to dig in and figure out where the problem is, that would be much appreciated. Of course, I'll be happy to review a proposed fix.

djc avatar Jul 13 '16 16:07 djc

@LiKao Did you ever find the underlying issue here? I'm having this problem as well.

iknox avatar Sep 27 '16 16:09 iknox

@iknox Unfortunately not. Also I don't have enough experience in debugging this kind of problem in python so that I cannot currently investigate much further,

LiKao avatar Mar 11 '17 14:03 LiKao