apache-ultimate-bad-bot-blocker
apache-ultimate-bad-bot-blocker copied to clipboard
[ADD/REMOVE] Seekport Crawler is a misbehaving bot.
Is this an Addition / Removal Request? Addition. Please and thank you!
Please List the User-Agent string or Referrer to be added/removed
example: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/webkit-version (KHTML, like Gecko) Silk/browser-version like Chrome/chrome-version Safari/webkit-version
Mozilla/5.0 (compatible; Seekport Crawler; http://seekport.com/)"
Please explain why it should be added/removed
Bot / User Agent doesn't respect current robots.txt
, .htaccess
, or even Apache bad bot settings what so ever and winds up toppling servers due to insane amounts of constant requests.
We had to block the offending IP at the firewall level.
For Additions: Please include a log sample 3-5 lines is adequate
I've sanitized the institution domain but you get the drift. The IP is real. We're running Apache in a Docker container which works really well. Big fans of your software.
The requests first come through to a Traefik Docker container (reverse-proxy) which then forwards the request to the Apache container running Bad Bot. Typically blocking by user-agent
does the trick asap but not this time it appears.
isle-proxy-prod | 65.21.180.83 - - [13/Oct/2021:17:49:06 +0000] "GET /collections/academic-departments-and-programs-records?islandora_solr_search_navigation=0&f[0]=mods_typeOfResource_ms:%22text%22&f[1]=mods_originInfo_dateCreated_mdt:[1901-01-01T00:00:00Z%20TO%201911-01-01T00:00:00Z] HTTP/1.1" 200 45570 "https://institution.example.org/collections/academic-departments-and-programs-records?islandora_solr_search_navigation=0&f[0]=mods_typeOfResource_ms:%22text%22" "Mozilla/5.0 (compatible; Seekport Crawler; http://seekport.com/)" 5533 "Host-PathPrefix-cantaloupe-0" "http://192.168.80.9:80" 5456ms
isle-proxy-prod | 65.21.180.83 - - [13/Oct/2021:17:49:07 +0000] "GET /collections/academic-departments-and-programs-records?islandora_solr_search_navigation=0&f[0]=mods_typeOfResource_ms:%22text%22&f[1]=mods_originInfo_dateCreated_mdt:[1911-01-01T00:00:00Z%20TO%201921-01-01T00:00:00Z] HTTP/1.1" 200 47252 "https://institution.example.org/collections/academic-departments-and-programs-records?islandora_solr_search_navigation=0&f[0]=mods_typeOfResource_ms:%22text%22" "Mozilla/5.0 (compatible; Seekport Crawler; http://seekport.com/)" 5534 "Host-PathPrefix-cantaloupe-0" "http://192.168.80.9:80" 5193ms
isle-proxy-prod | 65.21.180.83 - - [13/Oct/2021:17:49:08 +0000] "GET /collections/academic-departments-and-programs-records?islandora_solr_search_navigation=0&f[0]=mods_typeOfResource_ms:%22text%22&f[1]=mods_originInfo_dateCreated_mdt:[1921-01-01T00:00:00Z%20TO%201931-01-01T00:00:00Z] HTTP/1.1" 200 45724 "https://institution.example.org/collections/academic-departments-and-programs-records?islandora_solr_search_navigation=0&f[0]=mods_typeOfResource_ms:%22text%22" "Mozilla/5.0 (compatible; Seekport Crawler; http://seekport.com/)" 5535 "Host-PathPrefix-cantaloupe-0" "http://192.168.80.9:80" 5119ms
isle-proxy-prod | 65.21.180.83 - - [13/Oct/2021:17:49:11 +0000] "GET /collections/academic-departments-and-programs-records?islandora_solr_search_navigation=0&f[0]=mods_typeOfResource_ms:%22text%22&f[1]=mods_originInfo_dateCreated_mdt:[1941-01-01T00:00:00Z%20TO%201971-01-01T00:00:00Z] HTTP/1.1" 200 50603 "https://institution.example.org/collections/academic-departments-and-programs-records?islandora_solr_search_navigation=0&f[0]=mods_typeOfResource_ms:%22text%22" "Mozilla/5.0 (compatible; Seekport Crawler; http://seekport.com/)" 5537 "Host-PathPrefix-cantaloupe-0" "http://192.168.80.9:80" 4921ms
Any other important information to consider
Despite adding the following to the blacklist-user-agents.conf
, 3/4 of the requests were coming through which is also odd in that some requests were blocked to the site homepage but the more elaborate requests pushed through?
# Custom - 10/13 Seekport Crawler - http://seekport.com/
BrowserMatchNoCase "^(.*?)(\bSeekport\ Crawler\b)(.*)$" bad_bot
BrowserMatchNoCase "^(.*?)(\bseekport\ crawler\b)(.*)$" bad_bot
BrowserMatchNoCase "^(.*?)(\bSeekport \Crawler\b)(.*)$" bad_bot
BrowserMatchNoCase "^(.*?)(\bseekport \crawler\b)(.*)$" bad_bot
BrowserMatchNoCase "^(.*?)(\bseekport\b)(.*)$" bad_bot
BrowserMatchNoCase "^(.*?)(\bSeekport\b)(.*)$" bad_bot
BrowserMatchNoCase "^(.*?)(\bseekport.com\b)(.*)$" bad_bot