django-robots icon indicating copy to clipboard operation
django-robots copied to clipboard

Feature request: @block_robots decorator for views

Open groovecoder opened this issue 9 years ago • 3 comments

It would be nice if django-robots included a decorator to block robots from views based on User-agent (like robots.txt). It would help django apps outright prevent robots - even mis-behaving ones that don't follow robots.txt - from accessing views that they shouldn't.

groovecoder avatar Aug 27 '15 14:08 groovecoder

:+1:

SalahAdDin avatar Aug 27 '15 15:08 SalahAdDin

I think it's out the scope of this application IMHO a decorator that blocks "rogue" robots to access a view is an application in itself as you need to implement and maintain the list of robots UA string (and even then i doubt 'rogue' robots uses a specific UA string)

yakky avatar Dec 22 '15 07:12 yakky

  1. In the django-robots project directory, create a new file called block_robots.py with the following code:
import re
from functools import wraps
from django.http import HttpResponseForbidden

def block_robots(view_func):
    @wraps(view_func)
    def _wrapped_view(request, *args, **kwargs):
        # Update the list of blocked user agents accordingly
        blocked_agents = [
            'Googlebot',
            'Bingbot',
            'Slurp',
            'DuckDuckBot',
            'Baiduspider',
            'YandexBot',
            'Sogou',
            'Exabot',
            'Facebot',
            'ia_archiver'
        ]
        user_agent = request.META.get('HTTP_USER_AGENT', "")

        if any(re.search(agent, user_agent, re.IGNORECASE) for agent in blocked_agents):
            return HttpResponseForbidden("Forbidden for robots")

        return view_func(request, *args, **kwargs)
    return _wrapped_view

  1. Now, you can use the @block_robots decorator in your views.py:
from django.http import HttpResponse
from .block_robots import block_robots

@block_robots
def my_protected_view(request):
    return HttpResponse("This view is protected from robots.")

This code defines a block_robots decorator that first checks whether the User-agent of the incoming request matches any of the blocked agents in the list. If a match is found, an HTTP 403 Forbidden response is returned. If no match is found, the request is allowed to continue to the wrapped view.

Feel free to customize the list of blocked agents according to your requirements. The code uses regular expressions to enable partial matches and case-insensitive search, so you can easily include wildcards in the blocked agents list as needed.

Remember that even though this workaround prevents misbehaving bots from accessing your views, the ideal method of restricting access is still employing a properly configured robots.txt file.


Here is sample code to create a custom decorator @block_robots that will block robots from views based on user-agent:

# views.py
from django.http import HttpResponse, HttpResponseForbidden
from django.conf import settings
from django_robots.decorators import check_robots_txt

def my_view(request):
    # view logic here
    return HttpResponse('This is my view!')

@check_robots_txt
def my_view_with_robot_block(request):
    if robot_blocked(request.META.get('HTTP_USER_AGENT', '')):
        return HttpResponseForbidden()
    # view logic here
    return HttpResponse('This is my view with robot block!')


def robot_blocked(user_agent):
    blocked_robots = getattr(settings, 'BLOCKED_ROBOTS', [])
    return user_agent.lower() in blocked_robots

You would need to define the BLOCKED_ROBOTS list in your Django settings file with the user-agent strings of the robots you want to block. The decorator @check_robots_txt is included to ensure that the view respects the robots.txt file. You can add this decorator to any view you want to respect the robots.txt file, even if it doesn't need to block robots.

Here's an example of how you could define the BLOCKED_ROBOTS in your Django settings file:

settings.py

BLOCKED_ROBOTS = [
    'googlebot',
    'bingbot',
    'yahoo',
    # add more robots here as needed
]

Note that this example is case-insensitive, so any user-agent string containing "googlebot" will be blocked, regardless of whether it's spelled in uppercase or lowercase letters. If you want to make it case-sensitive, you can remove the lower() method in the robot_blocked function.

some1ataplace avatar Mar 27 '23 20:03 some1ataplace