apache-ultimate-bad-bot-blocker icon indicating copy to clipboard operation
apache-ultimate-bad-bot-blocker copied to clipboard

Allow that robots.txt and perhaps other URI are accessible

Open magicdude4eva opened this issue 8 years ago • 6 comments
trafficstars

I just started having a look at your project and it really looks good. The one thing I am missing is that certain resources (such as /robots.txt) should still be accessible.

i.e. Ahrefs (https://ahrefs.com/robot) honours robots.txt, but is blocked in the globalblacklist. It would be ideal that certain resources on the VirtualHost (such as /robots.txt) are still allowed for such bots.

magicdude4eva avatar Jul 18 '17 10:07 magicdude4eva

Hi @magicdude4eva thanks for the feedback. Some feature changes in progress will allow you to whitelist bots that are listed in the bad bots section and over-ride them. As per https://github.com/mitchellkrogza/apache-ultimate-bad-bot-blocker/issues/34 so hang tight as these changes are in progress.

mitchellkrogza avatar Jul 18 '17 11:07 mitchellkrogza

Sounds great. I still think it would be though a good idea to allow bots to access certain resources such as /robots.txt.

BTW: Good to see a fellow SA on Github (does not happen often)

magicdude4eva avatar Jul 18 '17 13:07 magicdude4eva

Thanks @magicdude4eva and yes also great to see another fellow South African on here.

I will see what logic I can work out regarding the robots.txt. Must first get all my Travis scripts online so that the repo becomes self generating then I can work on some mods to the templates. Also once the Travis CI build scripts are in place I will also be pushing out 2 versions. One for apache 2.2 using the old access control methods and one for Apache 2.4 using the new Apache 2.4 access control methods which won't require mod_access_compat anymore as per https://github.com/mitchellkrogza/apache-ultimate-bad-bot-blocker/issues/32

Thanks for the input and feedback 👍 Lots of changes still coming ...... one step at a time 😀

mitchellkrogza avatar Jul 18 '17 13:07 mitchellkrogza

@mitchellkrogza With Apache 2.4.26 you can use If-directives. CentOS 7 (7.3.1611) unfortunately only ships with Apache/2.4.25 (2 March 2017) so the If is a no-go for now.

The only work-around I managed to find is a negative LocationMatch which would apply the bot-filter on all resources except robots.txt - this works fine:

  #########
  # Block all web bot's - we are returning 403s
  <LocationMatch "^/(?!robots.txt)">
    include /home/apache/botblocking/globalblacklist.conf
  </LocationMatch>

magicdude4eva avatar Jul 24 '17 14:07 magicdude4eva

@magicdude4eva weird, I've tested the blocker on versions of Apache from 2.2 > 2.4.27 ??? Which version are you using Apache_2.2 or Apache_2.4 ??

mitchellkrogza avatar Jul 24 '17 14:07 mitchellkrogza

I'd have to see how I can implement first allowing anything to access robots.txt (as per your example) and then moving into other sections of the blocker.

I'm busy with a lot of documentation updates right now after completing bringing all the Travis CI generator and testing scripts online. Requires a lot of doc changes due to the two distinctly different versions of Apache_2.2 (for 2.2 > 24.+ but needs module access_compat) and Apache_2.4 (no access_compat needed).

Once done with with that I am going to start working on V4 which is going to be a lot different and have quite a different layout with a switch file where users can enabled and disable certain parts of the blocker. So they could turn OFF checking for user-agents but keep ON checking for referrers. That will be a breaking change so it will probably be released in a new branch until it;s tested 100% but that's only coming in a few weeks time.

Travis is very strict with testing and both versions pass all tests. Unfortunately a lot of work right now to go and make Travis check each version against multiple versions of Apache (right now I am making Travis use 2.4.27).

Not impossible but requires a lot of scripting in the build process to install a version > test > uninstall > install another version > test > uninstall .... etc etc etc. Certainly something to consider for putting into the list of things to do at some point.

mitchellkrogza avatar Jul 24 '17 15:07 mitchellkrogza