Module proposal: Cache `alloptions` to load only needed values
This proposes a module for the Performance Lab Plugin.
Overview
- Proposed module name: Conditional Option Autoloading:
- Proposed module slug: conditional_option_autoloading
- Proposed module owner GitHub username(s): pbearne
About the module
Early in the WordPress load, we load from the wp_options all the rows with the column autoload set "yes" this causes un-needed data to be stored in memory via the wp_load_alloptions function
The default value for autoloading is yes so unless it was a careful/knowledgable dev the opinion will set to autoload
This problem is increased as unless there is a deactivation function in a plugin then orphan rows(values) will still be loaded for plugins that have been removed.
This proposal is that only load the needed rows(values) for a given page
Purpose
By listening to calls get_option and recording the keys that are used for a current URL and are logged in
Then from keys get the ID for rows and save these IDs on shutdown to site cache (hopefully into Redis/Memcache)
Then on future calls to the same page, we use the keys to shortcut the get_all_options function to just the keys needed for that page.
To invalidate the cached ids we keep listening to every page in the get_option function to create a list of option names call for and then compare it to the list of options names for values we preloaded in get_all_options and if it has changed we save the new list of IDs
Only logged out for start until we have worked the best way to handle login and user ID But it will be sites with logged users that may benefit the most So we may need to look user ID in the key name and not add that key in case the plugin is storing user data in options
This will need a set of filters in WP core to enable this
- pre_option_all https://core.trac.wordpress.org/ticket/37930
- pre_get_alloptions https://core.trac.wordpress.org/ticket/56045
And we may need to load the code in sunrise.php or via Object-cache.php so we can run the code before wp_load_alloptions is called
Scope
We are not creating a caching service just calling the installed service (if any) We are not storing any option values just the IDs needed for the given page
Rationale
This issue may be described as a technical debt as it has resulted from the default setting being set to autoload to true So we are trying to reduce the size of the options data stored in mem for every page and not used
Other
Issues
If there are option names that have a user identifier included these will cause the cache to invalidate each use to partly mitigate this we may need to look for the user-id in the option_name we will need real-world testing to see if this is a problem
this will increase the size of the cache by one item per page
Some starter/proof of concept code https://github.com/pbearne/wordpress-develop/tree/conditional_options_preload
@felixarntz I'm not 100% on what the appropriate labels for this one would be; can you take a look? Thanks!
@bethanylang This is actually a new module proposal.
@pbearne In order to set up up for this module, can you slightly update the issue description to more formally propose this as a module, following the process documented in https://github.com/WordPress/performance/blob/trunk/docs/Proposing-a-new-module.md (mainly step 1)? #338 is a great example of a module proposal.
I think this sounds like a very promising idea. One follow-up thing that came up today during WCEU was that probably the main benefit of this will be for sites with a persistent object cache since it would resolve a critical problem where sometimes autoloaded options are too large to store in an object cache (related: https://github.com/WordPress/performance/issues?q=is%3Aissue+label%3A%22%5BModule%5D+Audit+Autoloaded+Options%22+). With that said though, it would be interesting to also test it on sites without one and compare the performance to how it was before. Since we would optimize the query, it may very well benefit us there too, maybe even more.
On sites with many pages/posts and/or many users this feature would lead to quite a lot of cache keys.
Good point from @tillkruss. Still, sites with a lot of users/posts/pages make a lot of requests to the database. Which I think is more expensive that a lot of keys.
Makes me think that we should have a way to clean up the options cache just as a countermeasure.
Yeah, it's a massive problem, maybe a sensible bytes threshold triggers this module to kick in? Count is kinda irrelevant, it's it's mostly Boolean/tiny values.
For some context, some hosts don't even allow alloptions to be cached if the key is bigger than 1MB in PHP. However I've seen sites with >800MB of alloptions loaded for every single page load 🥲
Just to clarify: The idea is to have a cache entry for each URL, not for each option. And the entry will contain only the options used in the current page.
@marioy47 yes a cache per URL one logged in and one logged out 2 caches The entries will only have the IDs of the options used on the page so we don't need to worry about invaliding the cache when option values change just when the options are needed for a given URL change. We do that by keeping a watch on the opt.ion called
I think to support even building a prototype of this module we need to finish #38. This would allow an entire cache group to be flushed, so we can wipe all alloptions for all URLs to be removed at once, when options change.
I will need this core change to add the needed filters https://core.trac.wordpress.org/ticket/56045 love to see these get added in 6.1
I have a remark about this fine proposal -- something to consider.
Background
You wrote we use the keys to shortcut the get_all_options function to just the keys needed for that page.
In a site with no persistent object cache: If I understand you correctly, this translates to a SQL statement shaped like this one on every page load.
SELECT option_name, option_value FROM wp_options WHERE option_id IN (17,42,1287,221);
That replaces
SELECT option_name, option_value FROM wp_options WHERE autoload = 'true';
Please correct me if I don't have this right: I believe the big performance win in the no-persistent-object-cache case will be the lower cost of the former query over the latter one.
Observation
Recent versions of MySQL and MariaDB satisfy the former query the IN() query -- by doing a so-called "multiple range read" (MRR). Different versions of MySQL / MariaDB handle these MRRs differently with older versions usually being dumber about it than recent versions. It's conceivable, in older versions, for the former query to be surprisingly inefficient, satisfied as a cascade of UNION ALL single-row SELECTS. In recent versions this is not a problem.
But, sigh, more than half of sites are on the old MySQL 5.5 and 5.6. https://wordpress.org/about/stats/ .
When coding and evaluating this module, it's important to evaluate the realized performance improvements, while keeping track of DBMS versions. (Worst case, punting a performance issue from one subsystem to another -- php to MySQL maybe -- isn't too good, especially seeing that MySQL is a bottleneck on many setups.)
Testing only on MySQL 8 and MariaDB 10.6 is a bad idea. Ask me how I know this sometime. -:)
@pbearne We definitely want to discuss this one in a future chat. Per @felixarntz's comment here, can you please update the issue description to include the items in the module proposal template?
Please correct me if I don't have this right: I believe the big performance win in the no-persistent-object-cache case will be the lower cost of the former query over the latter one.
and to reduce the size of the data fetched and stored in mem I have seen sites with 100's MB in options loaded but a faction use/needed for the current page even worse MB's of data loaded from options left behind after a plugin is removed
Here's a SQL query to show a histogram of the lengths of options stashed in wp_options. This includes all options including transients.
https://gist.github.com/OllieJones/3b9ad08a7191390dac8b952edc613596
This is a guess: in the vast majority of sites there will be a handful of big fat option values. Most option values are slender.
Obviously, the operators of "mature" WordPress sites will have deinstalled quite a few plugins over the years, and some of those plugins won't have cleaned up their option values upon deactivation or deinstallation. Detecting and removing fat options in that category strikes me as a site-health problem as much as a pageview-by-pageview operational problem. (These are actually plugin code-quality issues, but the offending plugins are long gone.)
As for currently-in-use fat autoloaded options, to my mind those reflect code-quality issues: is somebody storing too much stuff in an option?
With respect, maybe it would be best to start addressing the issue (it's a very real issue, to be sure) by an enhancement to the Performance Lab module for "Audit Autoloaded Options". Gathering some information on the fattest options might help with designing the simplest and most maintainable possible caching strategy.
That is my two picowatt hours worth of feedback.
The thing that worries me about this proposal is that for it to benefit normal WordPress sites (without a fast and persistent object cache), the storage of these options ids would have to go in the database.
- That's a lot of new rows in the database. A site with 50,000 unique URLs (not unreasonable) would generate 100,000 new options rows, as proposed.
- It would introduce frequent writes to the database on the frontend.
- Options autoloading would be impeded by two round trips to the database — one to get the list of IDs to load, and one to query those IDs.
- The MySQL query cache for "all options" would get fragmented, and bloated, as many pages would have slightly different queries.
When wp_options is abused by plugins, loading "all" options can slow down the User Experience. The case is:
- There are a lot of options, so loading all of them is time-consuming.
- The page only needs a small fraction of options.
Corollary: Pre-loading is 'always' slower when you only need a single option.