nrobots
nrobots copied to clipboard
nrobots doesn't handle entries like /?/ properly
nrobots doesn't handle entries like "Disallow: /?/" properly.
Looks like NRobots is converting the "Disallow: /?/" into "Disallow: /". The last checkin to that lib fixed a similar issue but obviously this presents other issues.
Even though this bug still exists in the nrobots lib, abot gives a workaround to this and similar issues with robots.txt preventing the crawl. See https://github.com/sjdirect/abot/commit/9bd3d7d91ebefb6e03ee2c2a1b5140cc4020073c for details. There is now an isIgnoreRobotsDotTextIfRootDisallowedEnabled config value that if set to true will ignore the robots.txt file for when the root uri of the crawl is disallowed.