crates.io Possible mitigations for occasionaly slow database requests

I think we've seen some encouraging results of moving some queries over to the read-only replica in #2073, but I think there are other improvements we could make to improve the performance of our primary database as well.

Problem description

Occasionally we see bursts of slow queries. Moving just a few of our expensive queries to the read-only replica has helped, but we've only moved a few hundred queries per hour to the replica and I don't believe we have actually addressed the root cause here.

In particular, I'm surprised to see that we have a lot of auto-vacuums occurring on the crates table. We're seeing 2 to 3 auto-vacuums per hour. We're also seeing frequent vacuuming of pg_toast_25513 which I'm guessing is the TOAST table associated with the crates table.

What I expect is happening, is that after running the update_downloads background job we end up with a lot of old tuples in the crates table. This table includes several columns that can become wide:

character varying
- description
- homepage
- documentation
- readme - can become very large
- repository
tsvector
- textsearchable_index_col

Proposed mitigations

Improve the `update_download` job

We currently grab all version_downloads that need to be processed, and then loop over those results, committing 1 transaction for each. This means that if a popular crate has download requests for 10 different versions, then we will commit to the crates table 10 times for that crate, quickly creating garbage in this table.

If we instead grouped the initial query by crate_id, we could batch the work for a crate into a single transaction.

More aggressively `TOAST`ing on the `crates` table

The other part of the problem seems to be that rows in the crate table can be large. By default PG will compress these columns if the row >2kB, but the row can grow to 8kB before columns are evicted to the TOAST table. So we're creating a lot of garbage tuples that may be large.

I don't see any obvious way to force these columns into TOAST. None of the 4 available strategies seem to force a column into external storage. We could ALTER TABLE crates SET (toast_tuple_target = 256) or some other small number, but I'm also not entirely certain this will actually push more data out to TOAST. It should make PG more aggressive about compressing the data, but I'm a bit unclear on if this will force anything into TOAST before hitting the 8kB maximum row length.

Importantly, the documentation does say "During an UPDATE operation, values of unchanged fields are normally preserved as-is; so an UPDATE of a row with out-of-line values incurs no TOAST costs if none of the out-of-line values change." This means that if we can force more data into TOAST then our updates of the download counts should not result in churn to that table. (Publishing new versions of a crate will update these values and will still cause some churn, but this is obviously much lower in volume than downloads.)

Leave more empty space in the `crates` table

I can't recall the specific knob that configures this, but I recall reading that there is a way to encourage PG to leave some extra space within each 8kB page. As I understand it, this increases the chance that PG can fit an updated row into the same page as the old row. This helps reduce the overhead of updating indexes.

Manually extract columns from the `crates` table

This would be a more invasive change, as we would have to update any queries that use these columns. It's also not clear to me exactly how we go about that with minimal impact to the production database, but if the above solutions don't provide sufficient improvements then this is something we can explore.

cc @sgrif

Jan 23 '20 01:01 jtgeibel

What's leading you to believe that the auto vacuums on this table are causing problems? The crates table is extremely small relative to the rest of the database, and I'd be shocked if vacuums there are causing serious performance issues. version_downloads is a much more likely candidate. I suspect most spikes are due to increased IO load during refresh_recent_crate_downloads more than anything else

On Wed, Jan 22, 2020 at 6:28 PM Justin Geibel [email protected] wrote:

I think we've seen some encouraging results of moving some queries over to the read-only replica in #2073 https://github.com/rust-lang/crates.io/pull/2073, but I think there are other improvements we could make to improve the performance of our primary database as well. Problem description

Occasionally we see bursts of slow queries. Moving just a few of our expensive queries to the read-only replica has helped, but we've only moved a few hundred queries per hour to the replica and I don't believe we have actually addressed the root cause here.

In particular, I'm surprised to see that we have a lot of auto-vacuums occurring on the crates table. We're seeing 2 to 3 auto-vacuums per hour. We're also seeing frequent vacuuming of pg_toast_25513 which I'm guessing is the TOAST https://www.postgresql.org/docs/11/storage-toast.html table associated with the crates table.

What I expect is happening, is that after running the update_downloads background job we end up with a lot of old tuples in the crates table. This table includes several columns that can become wide:

character varying

description

homepage

documentation

readme - can become very large

repository

tsvector

textsearchable_index_col

Proposed mitigations Improve the update_download job

We currently grab all version_downloads that need to be processed, and then loop over those results, committing 1 transaction for each. This means that if a popular crate has download requests for 10 different versions, then we will commit to the crates table 10 times for that crate, quickly creating garbage in this table.

If we instead grouped the initial query by crate_id, we could batch the work for a crate into a single transaction. More aggressively TOASTing on the crates table

The other part of the problem seems to be that rows in the crate table can be large. By default PG will compress these columns if the row >2kB, but the row can grow to 8kB before columns are evicted to the TOAST table. So we're creating a lot of garbage tuples that may be large.

I don't see any obvious way to force these columns into TOAST. None of the 4 available strategies seem to force a column into external storage. We could ALTER TABLE crates SET (toast_tuple_target = 256) or some other small number, but I'm also not entirely certain this will actually push more data out to TOAST. It should make PG more aggressive about compressing the data, but I'm a bit unclear on if this will force anything into TOAST before hitting the 8kB maximum row length.

Importantly, the documentation does say "During an UPDATE operation, values of unchanged fields are normally preserved as-is; so an UPDATE of a row with out-of-line values incurs no TOAST costs if none of the out-of-line values change." This means that if we can force more data into TOAST then our updates of the download counts should not result in churn to that table. (Publishing new versions of a crate will update these values and will still cause some churn, but this is obviously much lower in volume than downloads.) Leave more empty space in the crates table

I can't recall the specific knob that configures this, but I recall reading that there is a way to encourage PG to leave some extra space within each 8kB page. As I understand it, this increases the chance that PG can fit an updated row into the same page as the old row. This helps reduce the overhead of updating indexes. Manually extract columns from the crates table

This would be a more invasive change, as we would have to update any queries that use these columns. It's also not clear to me exactly how we go about that with minimal impact to the production database, but if the above solutions don't provide sufficient improvements then this is something we can explore.

cc @sgrif https://github.com/sgrif

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-lang/crates.io/issues/2153?email_source=notifications&email_token=AALVMKZG4MUGNUIYMYKCWPTQ7DXFJA5CNFSM4KKPI2R2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IIDVUMQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALVMK5F76WKKYZ4TXAZD2DQ7DXFJANCNFSM4KKPI2RQ .

--

Thanks, Sean Griffin

Jan 23 '20 01:01 sgrif

What's leading you to believe that the auto vacuums on this table are causing problems?

I don't think the vacuum itself is the problem, but I think it indicates that some tuning might help here. It looks like we have 34,854 crates in stock and based on the background job logs for a 1 hour period it looks like we made 25,088 updates to this table while processing downloads. That's a lot of churn and I expect this table is pretty fragmented, impacting data locality and probably the indexes for this table as well.

Now that I think of it, it is much better to just pull out the download count into a crate_downloads table, since updating that column is leading to the churn here. Coupling a small but high-update column with wide columns that change infrequently is the real root of my concern.

version_downloads is a much more likely candidate.

I agree that version_downloads is suspect as well. When you upgraded to PG 11 I think you mentioned that some tables were good candidates for partitioning. Partitioning on processed = 't' or on dates more than 90 days ago is probably worth exploring.

Jan 23 '20 03:01 jtgeibel

Yeah, unfortunately we can't partition on a Boolean for this. Just the date should be fine, but the existing BRiN index should have the same effects given the write patterns we have (consistently tied to date). I wasn't able to get good improvements from partitioning in my tests, but it's probably worth exploring further.

On Wed, Jan 22, 2020, 8:30 PM Justin Geibel [email protected] wrote:

What's leading you to believe that the auto vacuums on this table are causing problems?

I don't think the vacuum itself is the problem, but I think it indicates that some tuning might help here. It looks like we have 34,854 crates in stock and based on the background job logs for a 1 hour period it looks like we made 25,088 updates to this table while processing downloads. That's a lot of churn and I expect this table is pretty fragmented, impacting data locality and probably the indexes for this table as well.

Now that I think of it, it is much better to just pull out the download count into a crate_downloads table, since updating that column is leading to the churn here. Coupling a small but high-update column with wide columns that change infrequently is the real root of my concern.

version_downloads is a much more likely candidate.

I agree that version_downloads is suspect as well. When you upgraded to PG 11 I think you mentioned that some tables were good candidates for partitioning. Partitioning on processed = 't' or on dates more than 90 days ago is probably worth exploring.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-lang/crates.io/issues/2153?email_source=notifications&email_token=AALVMK45WK3IEHSFQBZH6VDQ7EFN3A5CNFSM4KKPI2R2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJV4PQQ#issuecomment-577488834, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALVMKYV5PEQNKLKF36FX63Q7EFN3ANCNFSM4KKPI2RQ .

Jan 23 '20 03:01 sgrif

Okay, thanks for the details. I'm not surprised you've dug into that already.

You may want to take a look at the logs for program:app/postgres at "Jan 19 18:53:22". Several queries trying to count new downloads get stuck waiting on a lock held by process program:app/postgres.16717. That process was certainly handling download count updates at that time. The logs from that process:

Jan 19 18:51:51 crates-io app/postgres.16717 [CRIMSON] [10-1]  sql_error_code = 00000 LOG:  duration: 4032.084 ms  execute __diesel_stmt_2: SELECT "version_downloads"."version_id", "version_downloads"."downloads", "version_downloads"."counted", "version_downloads"."date", "version_downloads"."processed" FROM "version_downloads" WHERE "version_downloads"."processed" = $1 AND "version_downloads"."downloads" != "version_downloads"."counted"
Jan 19 18:51:51 crates-io app/postgres.16717 [CRIMSON] [10-2]  sql_error_code = 00000 DETAIL:  parameters: $1 = 'f'
Jan 19 18:54:09 crates-io app/postgres.16717 [CRIMSON] [11-1]  sql_error_code = 00000 LOG:  duration: 3559.983 ms  execute <unnamed>: UPDATE "version_downloads" SET "processed" = $1 WHERE "version_downloads"."date" < date(CURRENT_TIMESTAMP) AND "version_downloads"."downloads" = "version_downloads"."counted" AND "version_downloads"."processed" = $2
Jan 19 18:54:09 crates-io app/postgres.16717 [CRIMSON] [11-2]  sql_error_code = 00000 DETAIL:  parameters: $1 = 't', $2 = 'f'
Jan 19 18:54:20 crates-io app/postgres.16717 [CRIMSON] [12-1]  sql_error_code = 00000 LOG:  duration: 11114.928 ms  execute __diesel_stmt_3: SELECT refresh_recent_crate_downloads()

and the logs for the scheduler and worker:

Jan 19 18:51:46 crates-io app/scheduler.8084 Enqueueing background job: update_downloads
Jan 19 18:51:51 crates-io app/background_worker.1 Updating 2832 versions
Jan 19 18:54:06 crates-io app/background_worker.1 Finished updating versions
Jan 19 18:54:09 crates-io app/background_worker.1 Finished freezing old version_downloads
Jan 19 18:54:20 crates-io app/background_worker.1 Finished running refresh_recent_crate_downloads

This temporary glitch was definitely the cause of the small batch of download requests for utf8-ranges 1.0.4 that completed 2 seconds later at 18:53:24. Those overall response times ranged from 1053ms to 2802ms.

It makes sense that counting new downloads would be temporarily blocked by the transaction propagating the download counts, but the transaction in that job is small:

Update the counted column of the local row in version_downloads
Update the downloads count in the versions table
Update the downloads count in the crates table
Update the total_downloads count in the metadata table

For some reason, it looks like this transaction held a lock taken in step 1 for 2+ seconds. Out of the 3 following updates, my guess is that updating crates probably caused the delay. The exact cause may not matter, as I think the easiest fix is to move step 1 to occur last. (I'd still like to investigate breaking out crate_downloads though.)

The logic already calculates amt based on reads made at the beginning of the background job (that first query lasting 4032ms). That's fine because any downloads occurring after that read will be counted the next time the job runs. If we defer the update to version_downloads.counted until immediately before the commit, it should eliminate the contention between counting new downloads and propagating previous downloads.

Jan 23 '20 04:01 jtgeibel

Improve the update_download job

https://github.com/rust-lang/crates.io/pull/8131 implemented a change that reduces the amount of reads to the crates and metadata tables. The job was previously taking an hour to process, sometimes even longer, now it's down to a couple of minutes again, with recent_crate_downloads refresh now taking the longest with about 10-15min.

Feb 17 '24 08:02 Turbo87

crates.io crates.io copied to clipboard

Possible mitigations for occasionaly slow database requests

Problem description

Proposed mitigations

Improve the update_download job

More aggressively TOASTing on the crates table

Leave more empty space in the crates table

Manually extract columns from the crates table

crates.io
crates.io copied to clipboard

Improve the `update_download` job

More aggressively `TOAST`ing on the `crates` table

Leave more empty space in the `crates` table

Manually extract columns from the `crates` table