robotstxt
robotstxt copied to clipboard
404 on robots.txt should fail to allow not error
I'm thinking that a 404 on a robots.txt file should fail into an always allow state. I'm not sure if there is a standard behavior a bot should follow if the file is missing.
I made a small patch for this for a project I'm working on but wanted your opinion on it before I submit a pull request.
I've had to do the same patch, and I agree that it's something that is definitely needed. The codebase shouldn't assume that all websites have a robots.txt when the accepted default in the wild is that sites only have a robots.txt if they've got something they would prefer wasn't crawled.