lemmy
lemmy copied to clipboard
[SQL] Optimize the database statements
Requirements
- [X] Is this a feature request? For questions or discussions use https://lemmy.ml/c/lemmy_support
- [X] Did you check to see if this issue already exists?
- [X] Is this only a feature request? Do not put multiple feature requests in one issue.
- [X] Is this a backend issue? Use the lemmy-ui repo for UI / frontend issues.
Is your proposal related to a problem?
Lemmy's SQL statements have some major issues. Besides the one that surfaced in #3306 , there are a number of others:
paging is unstable
Paging in the database is unstable. For example: We fetch 100 comments, ordered by hot ranking. The user reads those comments and scrolls on. In the meantime, comment on position 101 gets upvoted, climps the ranking is now in position 90. The user will not see this comment, because the database content and as such, the order is now different.
paging hurts the database
ORDER BY, LIMIT and OFFSET are a bad combination. The Database does the following steps (in this order)
- Execute the statement
- Sorts the WHOLE result set (if we have many columns, we sort many columns. If they are wide, we sort a bunch of data)
- Seeks to the offset position
- returns $LIMIT number of entries
If the result set is large, and PostgreSQL can not fit it in the work memory, it sorts on disk. If the topic is hot, each request will start a sort on disk, killing the server with IO
ultra wide joins
There are many statements that do a huge bunch of joins. The PostgreSQL query optimizer is based on statistics for tables. While queries can be planned very efficiently with some joins, the plans can go bad really fast, if there are many joins.
The reason for this is, that the planner does not know about the correlation between to columns in different tables.
Fulltext search
There is no fulltext search index on the whole schema
Describe the solution you'd like.
paging
Avoid paging in the database at all costs. One way to do this is for example:
- Fetch all ID's to show (only the ID's), if required, order them first
- Store these ID's somewhere (session, return via API, whatever fits)
- The client takes the first 100 ids, and requests them to view
- For the next page, the client requests the next 100 ids
The ordering stays stable, no paging is required on the database side.
Joins
The join issue is a bit more difficult to solve. But a good approach are CTE's. Materialized CTE's can be used to force an execution order, while normal CTE's leave it to the planner but provide readability and features that are not otherwise possible (like recursion for comment trees)
Fulltext search
Using tsvector operations should fix this.
Describe alternatives you've considered.
There is none. The current statements are a real issue, if not the biggest issue, for lemmys performance and stability
Additional context
I'm willing to help here. But this requires a deep understanding of the logic behind the queries, and what should be archived with them.
Feel free to contact me on lemmy (vapeloki@lemmy.{ml,world} or visit us in [email protected]
I suggest that a config switch be available to turn off inserts into the activity table. From what I understand, it's only used for debugging and nothing reads them. It's just creating more I/O.
@tbe You can pull down the code and run it locally, to check the DB to see the current indexes and such.
The current index on the path column is "idx_path_gist" gist (path)
ltree's are postgres's official way to store / query tree-like data.
The difficulty, which is out of my SQL depth, is how to do tree paging. I do have a setup in place which is able to page depth, but not the top-level comments. Its a very complicated topic that I could definitely use some assistance with.
I've re-opened that issue here because I clearly did not solve it, now that we're getting comment threads with thousands of top-level comments. https://github.com/LemmyNet/lemmy/issues/949 .
You can pull down the code and run it locally, to check the DB to see the current indexes and such.
Although lemmy-helper is clunky, it has a webinterface to view the database indexes. Here is a link to my live server running 0.18.0: https://lemmyadmin.bulletintree.com/query/pg_indexes?output=table
As to the wide join speed issue, @johanndt found a great way to optimize the existing queries: to do a subselect before the rest of the heavy joins. https://github.com/LemmyNet/lemmy/issues/2994
Implementing that for a lot of the tables is a priority for me when I get done with a few other things.
I also don't see why you think fetching ALL results is going to be faster than limit and offset. Some comment tables have hundreds of thousands of comments. Fetching and loading them all in memory, then sorting in code, is not a good solution. Best to leave SQL to do what its good at: querying data.
found a great way to optimize the existing queries: to do a subselect before the rest of the heavy joins. #2994
I personally pulled active queries out of the system and recommended a subselect approach before joins 7 days ago on Lemmy community !lemmyperformance - here is the comment: https://lemmy.ml/comment/797628
I've spent now over 80 hours of personal labor in the past 14 days on performance issues with Lemmy. We really need to encourage all servers to get pg_stat_statements PostgreSQL extension installed and get the crash logs dumped out of the major sites and identify the frequency of Rust failures and the weak points.
The difficulty, which is out of my SQL depth, is how to do tree paging. I do have a setup in place which is able to page depth, but not the top-level comments. Its a very complicated topic that I could definitely use some assistance with.
I had some more thoughts about it this weekend. And the best i can come of with is still the same: Don't page inside the database.
I had a look around how others system page. Paging is fine, as long as our order does not change. For example for ordering by old, this may work. But: As soon as we have trees, that will not work.
So, there are two options:
- Don't page at all. But this would do no good to API consumers on huge topics.
- Page on the client side. As recommended before: We build one tree, but without the contents, only the ID's, and let the client request the contents and metadata as required.
I know, this will introduce breaking API changes, but i don't see an alternate route.
Added bonus: We can later on cache the contents of comments, and only ask the database if we don't have a specific comment yet. Caching paged trees does not work on active discussions.
Given the massive performance problems people are having, pg_stat_statements and sharing data out of Beehaw, Lemmy.ml, Lemmy.world several weeks ago would have really saved all this time having to run around to a dozen different instances measuring the scope of the problem on a variety of hardware.
Back in November 2021 you guys were using pg_stat_statements - but when I recommended it in recent weeks as an urgent "call to arms" first step in the crisis of scaling the application, I was ignored.
I also don't see why you think fetching ALL results is going to be faster than limit and offset. Some comment tables have hundreds of thousands of comments. Fetching and loading them all in memory, then sorting in code, is not a good solution. Best to leave SQL to do what its good at: querying data.
- Ordering only the ID's based on 2 or 3 columns needs far less memory (or disk) then ordering the whole result set
- It is stable. We only have to do this once per caller, not for each call
- ORDER, LIMIT and OFFSET is bad, as state above, because it always needs to sort the whole dataset first, before applying LIMIT and OFFSET. So, there is no difference here in runtime, but as it has to be done for each call, a huge difference in the time that it is executed.
Also, even if we find a way to do this better inside the database, we would still run in the issue, that ordering may change between request for page 1 and page 2, leading to duplicate shown and "hidden" comments.
As to the wide join speed issue, @johanndt found a great way to optimize the existing queries: to do a subselect before the rest of the heavy joins.
This is a pretty good way to do it. I would recommend rewriting this to use CTE's instead of subselects, but the outcome should be the same.
Why CTE?
- Better readability
- Easier to maintain
- CTE's can be used for different statements, without worrying about nesting and such
I also think that person-to-person blocking and alternate-language filtering should be considered something to remove from the back-end, API. It means that the database has to do the filtering for each individual user doing blocking of other users, and some people really like to ignore other human beings and have large lists of blocks.
I also think how the user interface shows (tombstones) absent comments from a delete/block/language can be rendered in different ways, so I suggest that the front-end app choose to do the hiding. Or at least have some kind of performance settings in the app as to if a particular server operator wants to have higher hardware requirements because of such features (this kind of topic has been brought up in !mediascale community)
Based on my admittedly limited test on enterprise.lemmy.ml with 0.18 release candidates (I spent about 8 hours), I believe that blocking a person in a comment tree ends up hiding comments that are branched beyond the block. But I may have misinterpreted the situation, I need to test this more.
From a performance perspective, sorting makes repeat queries difficult to cache alone, but having each user ask the database to filter on every fetch is heavy and a potential denial of service opening.
It means that the database has to do the filtering for each individual user doing blocking of other users, and some people really like to ignore other human beings and have large lists of blocks.
And this is fine with negative joins. PostgreSQL is really amazing here on the generated plans. I wouldn't worry about this, as, like @dessalines wrote, that is what the database is good at.
From a performance perspective, sorting makes repeat queries difficult to cache alone, but having each user ask the database to filter on every fetch is heavy and a potential denial of service opening.
The idea should be, that later on we can cache as much as possible. But at least a TOC for the current comments must be fetched fresh. And those, uncacheable, queries, should be as light wight as possible.
paging is unstable
Paging in the database is unstable. For example: We fetch 100 comments, ordered by hot ranking. The user reads those comments and scrolls on. In the meantime, comment on position 101 gets upvoted, climps the ranking is now in position 90. The user will not see this comment, because the database content and as such, the order is now different.
I don't see why this is a problem? Literally every other forum software works this way. Even Reddit works this way. If you view the front page, and then after a while load page 2 (old reddit at least) then the posts might've moved around as you say and you might've missed something. But it does not really appear to be a big issue, since most people will scroll scroll scroll... or page page page, and then go back to the front page and reload.
I don't feel like this is surprising to the user, and in fact is probably what they would expect.
Ultimately it's not up to me, but I don't think paging is broken and adding lots of complexity is not worth it. Or maybe I'm misunderstanding the issue.
And this is fine with negative joins. PostgreSQL is really amazing here on the generated plans. I wouldn't worry about this, as, like @dessalines wrote, that is what the database is good at.
One of the major problems it that there is little to no caching at all, no second-teir caching that is routine in server applications.
Rust does not seem immune to this very fundamental client/server application problem that is happening with "blue screen of death" equivalent code failures on the front page of every major Lemmy site under modest load. A lot is being hidden by people not openly sharing their Rust crash/exception logs.
Community: /r/Rust
Posting: Tips on scaling a monolithic Rust web server?
Date: December 26, 2022 - fresh, 6 months ago
https://www.reddit.com/r/rust/comments/zvt1mu/comment/j1uxjs5/?utm_source=share&utm_medium=web2x&context=3
I don't see why this is a problem? Literally every other forum software works this way. Even Reddit works this way. If you view the front page, and then after a while load page 2 (old reddit at least) then the posts might've moved around as you say and you might've missed something. But it does not really appear to be a big issue, since most people will scroll scroll scroll... or page page page, and then go back to the front page and reload.
This is not so much an issue for Posts. But it is for comments. And reddit does not work this way on comments. They don't page on comments in this way. If i click on "show more replies", the client side JS sends a token, containing the information to identify the first paging data, and proceeds from there.
Also, some "forum" software does not need to order stuff like lemmy does. There is no "hot", "active", "top" ordering. The ordering is always "oldest first". So there is no paging issue regarding the order.
This is not so much an issue for Posts. But it is for comments.
Ah okay! I was thinking of posts, yes. You're right.
One of the major problems it that there is little to no caching at all, no second-teir caching that is routine in server applications.
You can cache the contents of comments, if you are willing to "hide" edits for a few seconds or so. But you can not cache the comment tree, as this is a thing that changes fast. Only caching requested stuff makes sense, so filtering in the database is a viable thing, at it is for a non-cachable result anyways.
The logs of the big servers hold all the secrets of just how frequently code is crashing/exceptions internally. The lemmy-ui doesn't even properly parse "timeout" and say to end-users: "the lemmy database is too slow, this is a database problem" - I'm today on 0.18.0 on lemmy.ml getting JSON errors as it tries to parse the words "Timeout" as a JSON response. The Rust code is hiding the problems, and the application crash logs of lemmy_server is a platinum mine of scaling problems.
https://lemmy.ml/post/1533618
The Rust code is hiding the problems, and the application crash logs of lemmy_server is a platinum mine of scaling problems.
Can we please limit this discussion to SQL/Database stuff? The API issue, that the error handler of the rust server does not return JSON, is a separete topic
Can we please limit this discussion to SQL/Database stuff?
If it isn't the SQL faulting, what is it? Are you guys literally not understanding that the SQL backed is falling over? Are you unable to trace these front-end errors all the way back to the SQL statements? That's what I've been doing for 80+ hours in 14 days.
On !lemmyperformance, I made "SQL" the icon for the community. Go look.
If it isn't the SQL faulting, what is it?
We have trouble with SQL, that's why i'm here after all. Yes. But that an error handler on the HTTP side, does not respond in JSON, that can happen on everything, and is clearly not SQL related.
that can happen on everything, and is clearly not SQL related.
"clearly", no it isn't "clear" n any way shape or form, and I assure you that I know very well that it is indeed SQL problems that are being masked by the UI - further, it is also being masked by server operators not dumping their logs here on GitHub to show that.
The problem here is that if you start a new server, it is empty of data, and you do not see how much the whole project is hiding what is very basic error handling and logging of SQL problems. Even the testing server, enterprise.lemmy.ml - is mostly empty of data.
Denial is what is going on, "reality distortion field" that the hiding of SQL crashes (Rust error logic) isn't at the root of the problem. "clearly", no, it isn't "clear" that this is being understood! The clarity is this Issue showing someone actually looking at pg_locks: https://github.com/LemmyNet/lemmy/issues/3061 - that was "closed" so project people can feel good about closing issues!
Your API is the middleman to everything, logging problems in your API can be to a simple disk file (not just throw everything into the system journal) even if the primary database is falling over. This is a hidden, behind the scene, server application - faults have to bubble up to end-users in useful specific identifiable error messages - and the server operator should have at least the most basic screen to know that their API is falling over. The front page showing nginx 500 errors is, as i said earlier, "blue screen of death" level application failure.
Who here has gone out of their way to publish an entire application dedicated to looking at PostgreSQL performance problems within Lemmy? Please stand up!
"Clear" and "clearly" do not belong anywhere near this application as it stands in June 2023.
Last time i try this @RocketDerp :
Yes, that the JSON API Errors are not JSON in many cases is an issue, but not this one. It also isn't about who has done what.
This is an issue tracker. Not a post on a lemmy instance. Open a new issue for the error messages, and keep this issue clean. Else, it become unhandable by the maintainers, and people that are willing to contribute.
During load testing, i have seen, that there is one statement, that is executed far to often, and leads to duplicate entries on the database:
INSERT INTO "local_user_language" ("local_user_id", "language_id") VALUES ($1, $2) RETURNING "local_user_language"."id", "local_user_language"."local_user_id", "local_user_language"."language_id"
This statement is responsible for the most writes, and most WAL records of all:
I tried to track it down, but i can not find the source. I don't understand the ORM.
@dessalines : any clue where to start?
maybe this: https://github.com/LemmyNet/lemmy/blob/ad6f244b618dd7cf2aa4bd8876e378ba62b35016/crates/db_schema/src/impls/actor_language.rs#L108
Thank you @L3v3L ! Not as bad as i thought, but also: not good. If a user signs up without a given language, all 184 ID's are inserted in this table.
That local_user_language is kind of a mess, and it very much needs simplifying.
As I mentioned in this issue comments on June 26, pg_stat_statements data out of the big servers (called out by name June 22, https://lemm.ee/comment/350801) - well finally one of them ran pg_stat_statements (lemmy.world) and shared some details of the output - discovered one of the major causes of overload/application crashes, pull request: https://github.com/LemmyNet/lemmy/pull/3482
I continue to plead that the project leaders recommend all servers install this extension until the scaling/performance crisis is resolved. I directly recommended it be made a topic of 0.18.0 release https://lemmy.world/comment/474829, and I am still begging that it happen in 0.18.1 - fully disclose the major performance problems and the critical role of using pg_stat_statements in real-world server spot-checking so that attention is put on specific queries. Something perhaps as drastic as putting a public accessible API call in place to get the results of querying pg_stat_statements from live servers (return in JSON). SELECT queryid, calls, rows, mean_exec_time, query, * FROM pg_stat_statements ORDER BY calls DESC
Thank you.
There's a matrix chat for that you can join, where people are providing logs and showing slow queries. You can also do that here, just make an issue and use the DB tag, so that people can work on it.
Also @RocketDerp be respectful, or we will block you from this repo.
Also @RocketDerp be respectful, or we will block you from this repo.
I have autism, and I do not grasp what you think I am doing wrong socially that is disrespectful. Please explain, elaborate. Exact quotes, please. I said "thank you".
I love you, I love all people. What has been disrespected here?