cattrs icon indicating copy to clipboard operation
cattrs copied to clipboard

memory leak of GenConverter with detailed_validation = True

Open Bryant-Yang opened this issue 1 year ago • 5 comments

while True: # creating GenConverter instance with all default params. # registering some simple structure hook, # loading some data,

the memory usage of process keep increasing.

If set detailed_validation = False, there is no memory leak. I'm not sure if it cost by line cache.

Bryant-Yang avatar Jul 27 '22 06:07 Bryant-Yang

Interesting, the hooks should be generated only once in any case. Would be curious if you can get a small repro?

Tinche avatar Jul 27 '22 09:07 Tinche

I tried getting a repro:

from attrs import define

from cattrs import GenConverter

c = GenConverter()


@define
class Test:
    a: int
    b: int
    c: int


while True:
    c.structure({"a": 1, "b": 2, "c": 3}, Test)

but it doesn't leak.

Tinche avatar Jul 27 '22 09:07 Tinche

This issue is found in one of my test server which has little memory, in a scheduled task loop which cause memory usage too high, had to reboot after running 2 days.

I have made my converter usage as singleton. It's just not a clear constraint that this converter object could not be gc collected in some situations.

I can give a simple test as below:

import gc
from datetime import datetime
from typing import List

import cattr
import time
from attr import attrs

def create_converter(detailed_validation):
    converter = cattr.GenConverter(detailed_validation=detailed_validation)
    converter.register_structure_hook(datetime,
                                      lambda val, _: datetime.strptime(val, '%Y-%m-%d %H:%M:%S')
                                      if val is not None else None)
    return converter


@attrs(auto_attribs=True)
class TestData:
    @attrs(auto_attribs=True)
    class InfoList:
        name: str

    name: str
    dt: datetime
    info_list: List[InfoList]


def test(detailed_validation):
    converter = create_converter(detailed_validation)
    data = {'name': "testAAA", "dt": datetime.now().strftime('%Y-%m-%d %H:%M:%S'), "info_list": [{'name': "testAAA"}] * 100}
    for _ in range(1000):
        converter.structure(data, TestData)


if __name__ == '__main__':
    import os
    import psutil
    process = psutil.Process(os.getpid())
    while True:
        test(detailed_validation=True)  # memory usage increasing
        # test(detailed_validation=False)   # memory usage not increasing
        gc.collect()
        time.sleep(.1)
        print(f'mem info: {process.memory_info().rss}')

Interesting, the hooks should be generated only once in any case. Would be curious if you can get a small repro?

Bryant-Yang avatar Jul 27 '22 10:07 Bryant-Yang

Ah, you seem to be creating a converter in a loop. I strongly advise not doing that, not only with cattrs, but also with libraries like attrs and dataclasses. All of these libraries have heavy initialization steps (including code generation) to make them faster afterwards; if you're constantly creating new converters you're paying the price and getting no benefit.

This is probably a leak somewhere in linecache, yeah. The hooks are generated once per converter. I can take a look when I have spare cycles, but due to what I said in the first paragraph I think this is lower prio.

Tinche avatar Jul 27 '22 10:07 Tinche

yes,don't create instance repeatedly if not necessary,agree that.

Bryant-Yang avatar Jul 27 '22 12:07 Bryant-Yang

This should actually already be fixed by #464

Tinche avatar Dec 26 '23 01:12 Tinche