wp-parsely icon indicating copy to clipboard operation
wp-parsely copied to clipboard

Add Support to Trigger Article Crawls & Hook into Appropriate Actions

Open jblz opened this issue 3 years ago • 1 comments

There's an API endpoint to instruct the service to re-crawl an Article:

Normally, articles are crawled during the first 24 hours after they are published when the metadata is most likely to change. It is possible to notify our servers when posts get updated. To do that, submit a POST request to the following address, appending the URL of the post to the end of it. The <API_KEY> should be replaced with your respective Site ID (apikey), and <API_SECRET> with the token accessible to the account administrators via API Settings.

https://dash.parsely.com/<API_KEY>/ping_crawl?secret=<API_SECRET>&url=<URL>

If an article is changed after that period, it's not apparent how long it would be before it gets recrawled.

WordPress does not have built-in Webhook support (though there exist plugins to add support). However, it does have built in support for running a function when various actions occur. Should we consider adding such support?

A potential (untested, somewhat pseudo-coded) approach to this might look like:

function wp_parsely_trigger_page_crawl( $post_id ) {
  if ( $POST_IS_NEWER_THAN_THRESHOLD ) {
     return;
  }
  $permalink = get_post_permalink( $post );
  wp_remote_post( get_ping_crawl_url( $permalink ), [ 'blocking' => false ] );
}
add_action( 'post_updated', 'wp_parsely_trigger_page_crawl' );

We should probably add a filter that lets users at least opt out of this behavior (if not opt in)

jblz avatar Apr 30 '21 21:04 jblz

Find more info in #1073

mehmoodak avatar Dec 30 '22 09:12 mehmoodak