django-robots icon indicating copy to clipboard operation
django-robots copied to clipboard

Microsoft Bing Robots.txt Tester said it does not accept Disallow line with empty value

Open tony95271 opened this issue 4 years ago • 1 comments

django-robots it generates following by default

User-agent: *
Disallow:

Sitemap: https://mysite.com/sitemap.xml

Bing Robots.txt Tester reports error on line 2.

tony95271 avatar Aug 04 '20 12:08 tony95271

Microsoft Bing Robots.txt Tester may report an error on line 2 of the default robots.txt file generated by django-robots because it contains a Disallow directive with an empty value. According to the Robots Exclusion Protocol, Disallow directives should specify a path or a pattern of paths that are not allowed to be crawled by the specified user agents.

To fix the error reported by Bing Robots.txt Tester, you can either remove the Disallow directive altogether, or specify a path or a pattern of paths that should be disallowed. For example, if you want to disallow crawling of all pages under the /admin/ path, you can modify the robots.txt file as follows:

User-agent: * Disallow: /admin/

Sitemap: https://mysite.com/sitemap.xml


It's possible to modify the default output of django-robots to address Bing's requirement.

Instead of simply specifying Disallow:, you can list out all the pages you want to disallow access to:

User-agent: * Disallow: /admin/ Disallow: /secret_page/ Disallow: /unpublished_articles/

Sitemap: https://mysite.com/sitemap.xml

This format includes a specific path to disallow, rather than just having an empty value. You can customize this list of paths as needed for your site.

To implement this in django-robots, you can define a ROBOTS_DISALLOWED_URLS dictionary in your Django settings file:

ROBOTS_DISALLOWED_URLS = {
    'User-agent': {
        'Disallow': [
            '/admin/',
            '/secret_page/',
            '/unpublished_articles/',
        ],
    },
}

This will generate a robots.txt file with the format shown above, specifying each path you want to disallow. You can adjust the list of URLs to match the pages you want to block access to.

some1ataplace avatar Mar 27 '23 21:03 some1ataplace