synapse icon indicating copy to clipboard operation
synapse copied to clipboard

Memory leak in synchrotron worker

Open kosmos opened this issue 1 year ago • 14 comments

Description

I have the following synchrotron worker configuration:

# Sync initial/normal
location ~ ^/_matrix/client/(r0|v3)/sync$ {
  proxy_pass https://$sync;
}


# Synchrotron
location ~ ^/_matrix/client/(api/v1|r0|v3)/events$ {
  proxy_pass https://synapse_sync;
}


# Initial_sync
location ~ ^/_matrix/client/(api/v1|r0|v3)/initialSync$ {
  proxy_pass https://synapse_initial_sync;
}

location ~ ^/_matrix/client/(api/v1|r0|v3)/rooms/[^/]+/initialSync$ {
  proxy_pass https://synapse_initial_sync;
}

And the following cache settings for this worker:

event_cache_size: 150K
caches:
  global_factor: 1
  expire_caches: true
  cache_entry_ttl: 30m
  sync_response_cache_duration: 2m
  cache_autotuning:
    max_cache_memory_usage: 2048M
    target_cache_memory_usage: 1792M
    min_cache_ttl: 1m
  per_cache_factors:
    stateGroupCache: 1
    stateGroupMembersCache: 5
    get_users_in_room: 20
    get_users_who_share_room_with_user: 10
    _get_linearized_receipts_for_room: 20
    get_presence_list_observers_accepted: 5
    get_user_by_access_token: 2
    get_user_filter: 2
    is_support_user: 2
    state_cache: 5
    get_current_state_ids: 5
    get_forgotten_rooms_for_user: 5

All other worker types work without problems, but it is a memory leak in synchrotrons, which leads to the exhaustion of all memory.

It seems that the cache_autotuning settings are not working. The environment variable PYTHONMALLOC=malloc is set at the operating system level.

According to my impressions, the problem became relevant after updating to 1.114 of Synapse and remains relevant in 1.116.

Steps to reproduce

To reproduce the problem, you need a homeserver with a heavy load and dedicated synchrotron workers.

Homeserver

Synapse 1.116.0

Synapse Version

Synapse 1.116.0

Installation Method

Docker (matrixdotorg/synapse)

Database

PostgreSQL

Workers

Multiple workers

Platform

Configuration

No response

Relevant log output

-

Anything else that would be useful to know?

No response

kosmos avatar Oct 04 '24 17:10 kosmos

Hi @kosmos, thanks for filing an issue. Can I just double-check that you're using jemalloc in your configuration? Doing so is required for the cache_autotuning option.

anoadragon453 avatar Oct 07 '24 13:10 anoadragon453

@anoadragon453 I'm using the official docker image, and this feature (jemalloc) is enabled by default.

kosmos avatar Oct 07 '24 18:10 kosmos

@kosmos Did you upgrade from 1.1130 before being on 1.114.0? I don't see any changes that are particularly related to caches in 1.114.0.

Around the time of the memory ballooning, are you seeing lots of initial syncs at once? Those requests are known to be memory intensive, especially for users with a large amount of rooms.

Do you have monitoring set up for your Synapse instance? If so, could you have a look at the Caches -> Cache Size graph around the time of the memory ballooning to see what cache might be overinflating? I'm also happy to poke through your Grafana metrics if you can make them privately or publicly available. Feel free to DM me at @andrewm:element.io if you'd like to share them via that route.

anoadragon453 avatar Oct 08 '24 10:10 anoadragon453

@anoadragon453 Our update history was as follows: 113->114->116. I will write to you in private messages.

kosmos avatar Oct 09 '24 07:10 kosmos

@kosmos and I exchanged some DMs to try and get to the bottom of this. The conclusion was that @kosmos does have jemalloc enabled in their deployment, yet the memory-based cache eviction doesn't seem to be kicking in. They had no mention of this log line in their logs:

https://github.com/element-hq/synapse/blob/05576f0b4b4a1ba5e756cbd1f0a064751f233d83/synapse/util/caches/lrucache.py#L151

This is odd, as the memory of the process (2500M) is going over their configured maximum (2048M).

However, only part of the total memory allocations of the application are being carried out by jemalloc. Over half(!) are still being performed by the built-in python allocator. Looking at the stats @kosmos provided me, jemalloc's stats showed that only ~700M of RAM was allocated at the time of the OOM, with 1.75G being allocated otherwise.

This resulted in the cache autotuning not kicking in. The real question is what is actually taking up all that memory, and why isn't it being allocated through jemalloc?

Native code (Synapse contains rust code) won't use jemalloc to allocate memory, so this could be one explanation. Native code in imports that use C/C++ extensions could also be a contender. Short of getting out a python memory profiler however, it's hard to say.

As a workaround, I'd recommend reducing your max_cache_memory_usage and target_cache_memory_usage values by 1/2 to allow the autotuning to kick in.

anoadragon453 avatar Oct 09 '24 13:10 anoadragon453

This approach won't work because the jmalloc memory graph is not at its maximum at the moment of physical memory growth. That is, there is no criterion by which we can understand when to clear the memory.

In addition, if it's not the memory allocated by jmalloc that is growing, then clearing the allocated jmalloc memory won't save me.

kosmos avatar Oct 09 '24 14:10 kosmos

Good point, yes. I think the next step is tracking down what exactly is taking up the memory in your homeserver when it OOM's, and from there figuring out why it's not being allocated by jemalloc.

Alternatively, having the ability to discard caches based on total application memory versus only jemalloc-allocated memory could work.

anoadragon453 avatar Oct 10 '24 15:10 anoadragon453

I still feel that there is a memory leak somewhere in the native libs, and it is important to find it.

kosmos avatar Oct 10 '24 16:10 kosmos

Hi,

I think we may suffer from the same problem. We run synapse 1.113 on a deployment with 2000 daily active users and memory usage grows indefinitely (?). We use a single process, no workers, with the docker image and the following configuration :

event_cache_size: 15K
caches:
  global_factor: 2.0
  sync_response_cache_duration: 2m
  cache_autotuning:
    max_cache_memory_usage: 1500M
    target_cache_memory_usage: 1000M
    min_cache_ttl: 5m

We had the same behavior without the cache_autotuning part (we added this autotuning part as part of our experiments to solve this, without improvement). The memory-drop on 04/10 is a reboot of the container.

I attach a few graphs :

RAM usage image

GC time image

Objects counts : image

Reactor image

We observe that RAM grows, number of objects stays quite similar day after day but GC time becomes longer and longer. At some point, Synapse becomes quite unresponsive, probably because of this, and we need to restart it.

Cheers François

flesueur avatar Oct 15 '24 13:10 flesueur

image

I've seen a similar picture on the initial sync worker. The graph shows the total memory consumed by the worker (top line) and the memory consumed by jemalloc (bottom line). As you can see from the graph, sometimes, these graphs are not parallel. The difference in memory is growing over time, which I believe indicates that there is a memory leak in the native code (the non-Python part of Synapse).

kosmos avatar Nov 05 '24 13:11 kosmos

same issue with 1.114.0rc1, added tracemalloc to investigate, here's top50 lines.

Image

skazancev avatar Apr 23 '25 13:04 skazancev

Looks like it was fixed in 1.116.0, but there is another issue that i fixed in my repository.

notifier.py


    def remove(self, notifier: "Notifier") -> None:
        """Remove this listener from all the indexes in the Notifier
        it knows about.
        """

        for room in self.rooms:
            lst = notifier.room_to_user_streams.get(room, set())
            lst.discard(self)

            # remove room to avoid memory leaks
            if not lst:
                notifier.room_to_user_streams.pop(room, None)
            # ---

        notifier.user_to_user_stream.pop(self.user_id)

notifier.room_to_user_streams grew to ≈20mb in just 12 hours.

BTW, here is the difference between 1.114.0 (green) and a modified notifier.py from 1.116.0 with a minor change (yellow).

Image

skazancev avatar May 01 '25 02:05 skazancev

@skazancev Looks reasonable, thank you! Would you be willing to chuck up a quick PR with the fix?

anoadragon453 avatar May 01 '25 12:05 anoadragon453

@anoadragon453 sure, #18380

skazancev avatar May 01 '25 18:05 skazancev