cachify icon indicating copy to clipboard operation
cachify copied to clipboard

Feature request: Test compatibility with "Blackhole for Bad Bots"

Open Zodiac1978 opened this issue 2 years ago • 4 comments

Jeff Starr has this plugin: https://wordpress.org/plugins/blackhole-bad-bots/

It adds a (hidden) link which is also blocked via robots.txt. If a bot is trying to crawl this page it knows this is a bad bot and blocks the IP address.

The problem is, that it is not compatible with every caching plugin, because it needs to fire some hooks which are not fired if only the cached HTML is shown.

See https://wordpress.org/plugins/blackhole-bad-bots/#installation for the problem description and https://plugin-planet.com/blackhole-pro-cache-plugins/ for the list of compatible plugins.

Let's test it and hopefully we can make Cachify compatible.

Zodiac1978 avatar Jul 22 '23 10:07 Zodiac1978

If you configure Cachify to generate static HTML files that are served by your webserver or some caching CDN directly, I don't see any elegant way to achieve this from the plugin's perspective.

With cached content served by the plugin itself, this should be possible. All listed compatible plugins have to be properly configured, s.t. the cached content is sent to the output after Blackhole Bad Bots takes action.

Currently, Cachify hooks into the template_redirect which is pretty early.

We should

  • check in which phase Blackhole Bad Bots does it's stuff
  • depending on the answer, see whether we might introduce something like a "late init" switch to optionally use a later phase

For the majority of use cases the earlier the better, as it reduces latency and computational overhead, so if we need to do things later for compatibility, I'd prefer a switch for that.

stklcode avatar Jul 22 '23 10:07 stklcode

check in which phase Blackhole Bad Bots does it's stuff

The most recent version is 3.6: https://plugins.trac.wordpress.org/browser/blackhole-bad-bots/tags/3.6 (No GitHub repo.)

Zodiac1978 avatar Jul 22 '23 10:07 Zodiac1978

check in which phase Blackhole Bad Bots does it's stuff

In the linked page I saw this explanation:

With page caching, the required init hook may not be fired, which means that plugins like Blackhole for Bad Bots are not able to check the request to see if it should be blocked. https://plugins.trac.wordpress.org/browser/blackhole-bad-bots/tags/3.6/blackhole.php#L77

Maybe we could trigger blackhole_scanner ourselves if we detect the plugin instead of changing the hook?

Or we could look at those other caching plugins mentioned and how they achieve compatibility.

Zodiac1978 avatar Jul 22 '23 10:07 Zodiac1978

Or we could look at those other caching plugins mentioned and how they achieve compatibility.

After reading some more, those plugins either seem to have a late init option or the full page caching can be disabled at all.

Two plugins can be "fixed" with adding a MU plugin with only this code:

function blackhole_verify_nonce($verify) { return true; }
add_filter('blackhole_verify_nonce', 'blackhole_verify_nonce');

This looks like a hard coded true for the nonce check.

Zodiac1978 avatar Jul 22 '23 11:07 Zodiac1978