wp-graphql-smart-cache
wp-graphql-smart-cache copied to clipboard
Unable to purge post after changing status from `Draft` to `Published`
When the status of a post changes from Draft to Published, the query for specific post (project in the example) does not have an associated key as it did before. That is, we can't invalidate the keys, it will always be HIT until the Cache-Control expires.
Even updating the data in such a post does not purge anything because it no longer has an associated key.
All other events are working fine, as far as the post was not in Draft state (or restored from trash) and then turned to Published again.
Querying a completely new created post works fine.
Testing custom Project Post type (single post is present):
Query example (testing via Insomnia):
(projects) http://localhost/wp/graphql?query={projects{edges{node{id,title}}}}
(project) http://localhost/wp/graphql?query={project(id:"our-slug",idType:SLUG){id,title}}
First run:
projects query -> MISS
c42fab2f4dc81c75f2e4b6180f1b5d0cdfd0fb22f863231bebab2f2a02bd3e10 graphql:Query list:project cG9zdDozNDU=
project query -> MISS
51404600df1e2ceed74fbd14251755e4b7958a369dc28ae551d6e5ff2a64831f graphql:Query cG9zdDozNDU=
Change status to Draft:
[14-Jun-2023 09:21:47 UTC] (graphql_purge) key: cG9zdDozNDU=, event: post_DELETE, user: 1, page: /wp/wp-admin/admin-ajax.php
[14-Jun-2023 09:21:47 UTC] (graphql_purge) key: skipped:post, event: post_DELETE, user: 1, page: /wp/wp-admin/admin-ajax.php
projects query -> MISS
c42fab2f4dc81c75f2e4b6180f1b5d0cdfd0fb22f863231bebab2f2a02bd3e10 graphql:Query list:project
project query -> MISS
51404600df1e2ceed74fbd14251755e4b7958a369dc28ae551d6e5ff2a64831f graphql:Query
Change status back to Published:
[14-Jun-2023 09:22:52 UTC] (graphql_purge) key: list:project, event: post_CREATE, user: 1, page: /wp/wp-admin/admin-ajax.php
projects query -> MISS
c42fab2f4dc81c75f2e4b6180f1b5d0cdfd0fb22f863231bebab2f2a02bd3e10 graphql:Query list:project cG9zdDozNDU=
project query -> HIT
51404600df1e2ceed74fbd14251755e4b7958a369dc28ae551d6e5ff2a64831f graphql:Query
Post is updated:
[14-Jun-2023 09:26:35 UTC] (graphql_purge) key: cG9zdDozNDU=, event: post_UPDATE, user: 1, page: /wp/wp-admin/admin-ajax.php
[14-Jun-2023 09:26:35 UTC] (graphql_purge) key: skipped:post, event: post_UPDATE, user: 1, page: /wp/wp-admin/admin-ajax.php
projects query -> MISS
c42fab2f4dc81c75f2e4b6180f1b5d0cdfd0fb22f863231bebab2f2a02bd3e10 graphql:Query list:project cG9zdDozNDU=
project query -> HIT
51404600df1e2ceed74fbd14251755e4b7958a369dc28ae551d6e5ff2a64831f graphql:Query
New post is published:
[14-Jun-2023 09:32:00 UTC] (graphql_purge) key: list:project, event: post_CREATE, user: 1, page: /wp-json/wp/v2/project/1038
projects query -> MISS
c42fab2f4dc81c75f2e4b6180f1b5d0cdfd0fb22f863231bebab2f2a02bd3e10 graphql:Query list:project cG9zdDoxMDM4 cG9zdDozNDU=
project query (new post)-> MISS
d9154add95f1fabfe6844d1716c23cd1b7ef0c4ca43d1ba867219e16859a88ed graphql:Query cG9zdDoxMDM4
purge-varnish-cache.php:
<?php
function purge_varnish_cache( $purge_keys ) {
// Determine the URL to be purged based on server host
$server_host = isset($_SERVER['HTTP_X_FORWARDED_HOST']) ? $_SERVER['HTTP_X_FORWARDED_HOST'] : $_SERVER['HTTP_HOST'];
// The server must be running Varnish Proxy, we must be able to target the host with a PURGE request (http only)
$purge_url = 'http://' . $server_host . '/';
// Create headers for the request with the purge keys
$headers = array(
'Xkey: ' . $purge_keys
);
// Initialize and configure a cURL request
$curl = curl_init();
curl_setopt_array(
$curl,
array(
CURLOPT_URL => $purge_url, // URL to purge
CURLOPT_CUSTOMREQUEST => 'PURGE', // HTTP method as PURGE
CURLOPT_RETURNTRANSFER => true, // Return response as string
CURLOPT_HTTPHEADER => $headers, // Set headers
)
);
// Execute the cURL request
curl_exec( $curl );
// Check for errors and log error message if any
if ( curl_errno( $curl ) ) {
$error_message = curl_error( $curl );
error_log( print_r( $error_message, true ) );
}
// Close the cURL session
curl_close( $curl );
}
// Add an action hook for purging cache using WP Graphql Smart Cache plugin
add_action( 'graphql_purge', function ( $purge_keys ) {
purge_varnish_cache( $purge_keys );
}, 10, 1 );
default.vcl Varnish config:
vcl 4.1;
import xkey;
import std;
backend default {
.host = "app.internal";
.port = "8080";
}
acl purge {
"0.0.0.0"/0; // Allow all IP addresses
"::0"/0; // Allow all IPv6 addresses
}
sub vcl_recv {
# Remove empty query string parameters
# e.g.: www.example.com/index.html?
if (req.url ~ "\?$") {
set req.url = regsub(req.url, "\?$", "");
}
# Remove port number from host header
set req.http.Host = regsub(req.http.Host, ":[0-9]+", "");
# Sorts query string parameters alphabetically for cache normalization purposes
set req.url = std.querysort(req.url);
# Remove the proxy header to mitigate the httpoxy vulnerability
# See https://httpoxy.org/
unset req.http.proxy;
# Add X-Forwarded-Proto header when using https
if (!req.http.X-Forwarded-Proto) {
if(std.port(server.ip) == 443 || std.port(server.ip) == 8443) {
set req.http.X-Forwarded-Proto = "https";
} else {
set req.http.X-Forwarded-Proto = "http";
}
}
# Set the X-Forwarded-Host header to the proxy host
set req.http.X-Forwarded-Host = "proxy-app.internal";
# Purge logic to remove objects from the cache.
if (req.method == "PURGE") {
if (client.ip !~ purge) {
return (synth(403, "Forbidden"));
}
if (req.http.xkey) {
set req.http.n-gone = xkey.softpurge(req.http.xkey);
return (synth(200, "Invalidated " + req.http.n-gone + " objects"));
}
return(synth(200, "Purged"));
}
# Only handle relevant HTTP request methods
if (
req.method != "GET" &&
req.method != "HEAD" &&
req.method != "PUT" &&
req.method != "POST" &&
req.method != "PATCH" &&
req.method != "TRACE" &&
req.method != "OPTIONS" &&
req.method != "DELETE"
) {
return (pipe);
}
# Remove tracking query string parameters used by analytics tools
if (req.url ~ "(\?|&)(utm_source|utm_medium|utm_campaign|utm_content|gclid|cx|ie|cof|siteurl)=") {
set req.url = regsuball(req.url, "&(utm_source|utm_medium|utm_campaign|utm_content|gclid|cx|ie|cof|siteurl)=([A-z0-9_\-\.%25]+)", "");
set req.url = regsuball(req.url, "\?(utm_source|utm_medium|utm_campaign|utm_content|gclid|cx|ie|cof|siteurl)=([A-z0-9_\-\.%25]+)", "?");
set req.url = regsub(req.url, "\?&", "?");
set req.url = regsub(req.url, "\?$", "");
}
# Only cache GET and HEAD requests
if (req.method != "GET" && req.method != "HEAD") {
set req.http.X-Cacheable = "NO:REQUEST-METHOD";
return(pass);
}
# Mark static files with the X-Static-File header, and remove any cookies
# X-Static-File is also used in vcl_backend_response to identify static files
if (req.url ~ "^[^?]*\.(7z|avi|bmp|bz2|css|csv|doc|docx|eot|flac|flv|gif|gz|ico|jpeg|jpg|js|less|mka|mkv|mov|mp3|mp4|mpeg|mpg|odt|ogg|ogm|opus|otf|pdf|png|ppt|pptx|rar|rtf|svg|svgz|swf|tar|tbz|tgz|ttf|txt|txz|wav|webm|webp|woff|woff2|xls|xlsx|xml|xz|zip)(\?.*)?$") {
set req.http.X-Static-File = "true";
unset req.http.Cookie;
return(hash);
}
# No caching of special URLs, logged in users and some plugins
if (
req.http.Cookie ~ "wordpress_(?!test_)[a-zA-Z0-9_]+|wp-postpass|comment_author_[a-zA-Z0-9_]+|woocommerce_cart_hash|woocommerce_items_in_cart|wp_woocommerce_session_[a-zA-Z0-9]+|wordpress_logged_in_|comment_author|PHPSESSID" ||
req.http.Authorization ||
req.url ~ "add_to_cart" ||
req.url ~ "edd_action" ||
req.url ~ "nocache" ||
req.url ~ "^/addons" ||
req.url ~ "^/bb-admin" ||
req.url ~ "^/bb-login.php" ||
req.url ~ "^/bb-reset-password.php" ||
req.url ~ "^/cart" ||
req.url ~ "^/checkout" ||
req.url ~ "^/control.php" ||
req.url ~ "^/login" ||
req.url ~ "^/logout" ||
req.url ~ "^/lost-password" ||
req.url ~ "^/my-account" ||
req.url ~ "^/product" ||
req.url ~ "^/register" ||
req.url ~ "^/register.php" ||
req.url ~ "^/server-status" ||
req.url ~ "^/signin" ||
req.url ~ "^/signup" ||
req.url ~ "^/stats" ||
req.url ~ "^/wc-api" ||
req.url ~ "^/wp-admin" ||
req.url ~ "^/wp-comments-post.php" ||
req.url ~ "^/wp-cron.php" ||
req.url ~ "^/wp-login.php" ||
req.url ~ "^/wp-activate.php" ||
req.url ~ "^/wp-mail.php" ||
req.url ~ "^/wp-login.php" ||
req.url ~ "^\?add-to-cart=" ||
req.url ~ "^\?wc-api=" ||
req.url ~ "^/preview=" ||
req.url ~ "^/\.well-known/acme-challenge/"
) {
set req.http.X-Cacheable = "NO:Logged in/Got Sessions";
if (req.http.X-Requested-With == "XMLHttpRequest") {
set req.http.X-Cacheable = "NO:Ajax";
}
return(pass);
}
# Remove any cookies left
unset req.http.Cookie;
return(hash);
}
sub vcl_hash {
if (req.http.X-Forwarded-Proto) {
# Create cache variations depending on the request protocol
hash_data(req.http.X-Forwarded-Proto);
}
}
sub vcl_backend_response {
# Inject URL & Host header into the object for asynchronous banning purposes
set beresp.http.x-url = bereq.url;
set beresp.http.x-host = bereq.http.host;
# If we dont get a Cache-Control header from the backend
# we default to 1h cache for all objects
if (!beresp.http.Cache-Control) {
set beresp.ttl = 1h;
set beresp.http.X-Cacheable = "YES:Forced";
}
# If the file is marked as static we cache it for 1 day
if (bereq.http.X-Static-File == "true") {
unset beresp.http.Set-Cookie;
set beresp.http.X-Cacheable = "YES:Forced";
set beresp.ttl = 1d;
}
if (beresp.http.Set-Cookie) {
set beresp.http.X-Cacheable = "NO:Got Cookies";
} elseif (beresp.http.Cache-Control ~ "private") {
set beresp.http.X-Cacheable = "NO:Cache-Control=private";
}
# The cached document should be "tagged" using the values returned by the x-graphql-keys header
if (beresp.http.x-graphql-keys) {
set beresp.http.xkey = beresp.http.x-graphql-keys;
}
}
sub vcl_deliver {
# Debug header
if (req.http.X-Cacheable) {
set resp.http.X-Cacheable = req.http.X-Cacheable;
} elseif (obj.uncacheable) {
if(!resp.http.X-Cacheable) {
set resp.http.X-Cacheable = "NO:UNCACHEABLE";
}
} elseif (!resp.http.X-Cacheable) {
set resp.http.X-Cacheable = "YES";
}
set resp.http.X-Cache-Hits = obj.hits;
if (obj.hits > 0) {
set resp.http.X-Cache = "HIT";
} else {
set resp.http.X-Cache = "MISS";
}
# Cleanup of headers
unset resp.http.x-url;
unset resp.http.x-host;
unset resp.http.via;
unset resp.http.x-varnish;
}
@predaytor do you have GraphQL Object Caching enabled or disabled?
Since you're using varnish, make sure to disable the GraphQL Object Caching.
Can you confirm that it is disabled when you experience this behavior?
@jasonbahl, confirm, I have Object Caching disabled.
@predaytor so if I understand correctly, you're querying for a post by slug, but that post is a draft? So you get a null response and no node ID in the cache key response?
How would the client know about that slug if the draft post wasn't already publicly visible?
Can you help me understand more about the flow?
Are you using authenticated requests to query draft posts?
@jasonbahl not exactly. The post has a status of Draft - everything is fine, the query projects gets MISS, without this drafted post), the query project by slug is also (data is null).
But after changing that particular post's status back to Published - the projects query gets a MISS (displaying this post among others, all is fine), but the project query for that slug remains with null data.
Creating a new post (which I assume has no status at all) works fine, all queries get MISS (purge works).
So the problem is only with a query for a specific post (querying by slug in the example) that has a status of Published after being in a status of Draft.
Authorization headers are not used during request execution (using Insomnia GET requests).
@predaytor ya, so my question is, how is the client making the query for the draft post by slug if the slug isn't known yet?
@jasonbahl, but I'm assuming that after changing the status to Published from Draft, the post should be publicly visible, including slug requests? Sorry if I'm not following correctly.
Ya, so it seems like you're querying for a draft post by slug:
query {
project( id: "draft-project" ) {
id
title
}
}
Since it's a draft, the query gets cached with null as the results, and no Node IDs are added to the cache keys because no nodes were resolved.
So, when the project is published and purge( 'list:post' ) is called, this query is not purged because it didn't have list:post or an individual node ID in the keys, so it remains cached.
So, my question remains, how does the client know about the "draft-project" slug if the draft project isn't publicly queryable?
Like what path is leading the client codebase to executing queries for draft projects by slug if the slug isn't actually known publicly because it's still a draft?
This is an interesting scenario, though.
Perhaps we need to track nodes that are resolved under the hood but not returned.
Like in this case, the draft node (project:123, for example) was resolved under the hood, but was not returned in the response because a public user cannot see drafts.
But, perhaps we need to output those un-returned Nodes in the X-GraphQL-Keys for cases like this where a node transitions from draft to published and back 🤔
Oh, I get it. Will try to investigate more!
I have a page with a list of all projects with links to each of them (projects query). For example, one publication received the status of Draft. The following query shows the list without this draft (MISS).
I use varnish soft-purge which revalidate the cache and shows the new version on the next request.
I've also enabled prefetching (using the Astro framework) which pre-renders the HTML, styles, etc. of a particular page on hover in the background (ie our project query is running).
Or the user already knows a certain link to the project (in bookmarks, etc.).
So, if the post is Draft, the user received zero data and request got cached (MISS). Changing the status back to Published does not revalidate the request, the next request will still be a HIT, should be MISS.
@predaytor ok, thanks for the info.
I think we might need to consider tracking ids of nodes that are resolved but not returned, as proposed here: https://github.com/wp-graphql/wp-graphql-smart-cache/issues/223#issuecomment-1601024471
🤔
fyi, I'm actively working on this. I think I might have a solution. Will try to get an update out soon! 🤞🏻
Thanks 🙏
Hi @jasonbahl, is there any update on this bug? I am having the same issue with draft and scheduled posts.
I thought I had a solution in July, but didn't quite work the way I hoped so it needs to be re-visited.
Definitely interested in contributions if anyone has ideas on how to support this scenario.
Since draft posts are not public, their IDs are never exposed, so their ID is not tracked in the X-GraphQL-Keys, and therefore a cached query for a draft post will not be tagged with the ID of the post, and will not be purged from the cache when that post is published and purge( $postId ) is called.
I'm not sure the best way to handle this.
We could maybe output the IDs of nodes that aren't public in the X-GraphQL-Keys header potentially, allowing for a query for a draft that returns null to still be tagged with the ID of the draft post. 🤔
My current thought is that if a query asks for node(s) and no nodes are returned, we could return a no-cache header.
🤔
Another idea is that we could return a node:null cache key, and then call purge( 'node:null' ) whenever publish events occur.
This would allow queries with null responses to remain cached until some event purges them, which would likely benefit the server compared to not caching at all. 🤔
i.e.
{
nodeByUri( uri: "non-existing-page" ) { ... }
}
would be cached.
Subsequent hits to that query would be cache: HIT, and the server would not have to attempt to resolve it.
Whenever a publish event (publishing a post being the most common one) we'd call purge( "node:null" ) (similar to how we call purge( "list:$type" ) and that would clear these caches.
So it's a balance again of allowing these types of queries to benefit from caching, but also benefit from being cache miss when potentially related actions have occurred.
This also would allow us to NOT expose IDs of non-public nodes in headers, as ideated here: https://github.com/wp-graphql/wp-graphql-smart-cache/issues/223#issuecomment-1745360382
👋🏻 This filter can be used:
add_filter( 'graphql_response_headers_to_send', function( $headers ) {
$request = \WPGraphQL\Router::get_request();
if ( ! $request instanceof \WPGraphQL\Request ) {
return $headers;
}
$query_analyzer = $request->get_query_analyzer();
if ( empty( $query_analyzer->get_runtime_nodes() ) ) {
$headers['Cache-Control'] = 'no-cache, must-revalidate';
}
return $headers;
}, 20, 1 );
This will return a 'no-cache, must-revalidate' header if no nodes are resolved in the request.
Trade Offs
With this filter in place queries for things that are not nodes, such as WordPress settings, would not be cached with this strategy.
However, currently settings are not being properly invalidated once they are cached. . .so whether this is addressed or not there's a "bug" with querying settings and caching at the moment.
Related issues:
- https://github.com/wp-graphql/wp-graphql-smart-cache/issues/158
- https://github.com/wp-graphql/wp-graphql/issues/2459
Thanks!