robotstxt 404 on robots.txt should fail to allow not error

404 on robots.txt should fail to allow not error

Open evilpacket opened this issue 12 years ago • 1 comments

I'm thinking that a 404 on a robots.txt file should fail into an always allow state. I'm not sure if there is a standard behavior a bot should follow if the file is missing.

I made a small patch for this for a project I'm working on but wanted your opinion on it before I submit a pull request.

Nov 18 '12 20:11 evilpacket

I've had to do the same patch, and I agree that it's something that is definitely needed. The codebase shouldn't assume that all websites have a robots.txt when the accepted default in the wild is that sites only have a robots.txt if they've got something they would prefer wasn't crawled.

Jan 22 '13 14:01 Racheet

robotstxt robotstxt copied to clipboard

404 on robots.txt should fail to allow not error

robotstxt
robotstxt copied to clipboard