django-robots
django-robots copied to clipboard
Microsoft Bing Robots.txt Tester said it does not accept Disallow line with empty value
django-robots it generates following by default
User-agent: *
Disallow:
Sitemap: https://mysite.com/sitemap.xml
Bing Robots.txt Tester reports error on line 2.
Microsoft Bing Robots.txt Tester may report an error on line 2 of the default robots.txt file generated by django-robots because it contains a Disallow directive with an empty value. According to the Robots Exclusion Protocol, Disallow directives should specify a path or a pattern of paths that are not allowed to be crawled by the specified user agents.
To fix the error reported by Bing Robots.txt Tester, you can either remove the Disallow directive altogether, or specify a path or a pattern of paths that should be disallowed. For example, if you want to disallow crawling of all pages under the /admin/ path, you can modify the robots.txt file as follows:
User-agent: * Disallow: /admin/
Sitemap: https://mysite.com/sitemap.xml
It's possible to modify the default output of django-robots to address Bing's requirement.
Instead of simply specifying Disallow:, you can list out all the pages you want to disallow access to:
User-agent: * Disallow: /admin/ Disallow: /secret_page/ Disallow: /unpublished_articles/
Sitemap: https://mysite.com/sitemap.xml
This format includes a specific path to disallow, rather than just having an empty value. You can customize this list of paths as needed for your site.
To implement this in django-robots, you can define a ROBOTS_DISALLOWED_URLS dictionary in your Django settings file:
ROBOTS_DISALLOWED_URLS = {
'User-agent': {
'Disallow': [
'/admin/',
'/secret_page/',
'/unpublished_articles/',
],
},
}
This will generate a robots.txt file with the format shown above, specifying each path you want to disallow. You can adjust the list of URLs to match the pages you want to block access to.