synapse icon indicating copy to clipboard operation
synapse copied to clipboard

Partial revert #17044 to fix sql performance regression

Open csett86 opened this issue 1 year ago • 9 comments

For details see #17129. Compared to a full revert, we keep the class method instroduced in #17129, as its used elsewhere by now

Fixes: #17129 Signed-off-by: Christoph Settgast [email protected]

Pull Request Checklist

  • [x] Pull request is based on the develop branch
  • [x] Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • [ ] Code style is correct (run the linters)

csett86 avatar May 04 '24 16:05 csett86

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar May 04 '24 16:05 CLAassistant

@csett86 have you tried this and seen that it helps? The queries should basically be the same so am a bit confused how the revert would help, but if you've confirmed it does then I'll have a deeper look

erikjohnston avatar May 08 '24 13:05 erikjohnston

Yes @erikjohnston, I have deployed this for 10 days now and it helps immensely: https://github.com/element-hq/synapse/issues/17129#issuecomment-2080918653

This is on a public homeserver with a reasonable amount of rooms and active users, where postgres was maxing out without this revert. See the issue #17129 for details and others reporting the same.

csett86 avatar May 08 '24 16:05 csett86

Hi @csett86, I would be very interested to know if the patch in https://github.com/element-hq/synapse/pull/17169 helps?

erikjohnston avatar May 09 '24 12:05 erikjohnston

@csett86 I can confirm your mr also fixes my problem, I'm using your patch on a docker build, io is back to normal as well as cpu usage, thanks !

PhieF avatar May 11 '24 11:05 PhieF

We don't currently think this is safe, but we're still looking into how to fix the performance regression in a safe way.

reivilibre avatar May 29 '24 11:05 reivilibre

@reivilibre for now I'm keeping this patch on my server, without it it is fully unusable (both synapse and other services)

PhieF avatar Jun 01 '24 13:06 PhieF

FWIW, I applied this patch on my server last week (29-01 at 16:00), and this was the effect on my database server's memory usage:

Graph depicting memory usage, showing a sawtooth pattern varying between 0.5 GiB and 4.5 GiB up to 29-01 at 16:00, after which it remains mostly constant around 700 MiB

(the memory drops are where Postgres gets OOM-killed)

14mRh4X0r avatar Feb 01 '25 17:02 14mRh4X0r

We don't currently think this is safe, but we're still looking into how to fix the performance regression in a safe way.

@reivilibre What are the perceived risks of running with this today? Any progress on the issue?

3nprob avatar May 16 '25 00:05 3nprob