isDisallowed() returns true for / despite no matching Disallow rule in robots.txt
Hi,
I'm encountering a possible issue with the way isDisallowed() behaves when parsing the robots.txt file from https://www.natureetdecouvertes.com/robots.txt.
Context I'm checking if crawling the homepage / is allowed for a standard browser User-Agent like:
swift Copier Modifier Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36 When I run:
js Copier Modifier const isAllowed = robotsTxt.isAllowed('https://www.natureetdecouvertes.com/', userAgent); const isDisallowed = robotsTxt.isDisallowed('https://www.natureetdecouvertes.com/', userAgent); I get:
js Copier Modifier isAllowed === undefined isDisallowed === true However, the robots.txt does not contain any rule explicitly disallowing /. There is no Disallow: /, and the default behavior according to the Robots Exclusion Protocol (RFC 9309) is to allow access to / unless explicitly blocked.
Expected behavior isDisallowed('/') should return false, and isAllowed('/') should return true (or at least not undefined if a fallback to User-agent: * applies).
Notes The user-agent I’m testing is not listed in any specific User-agent: group, so the fallback to User-agent: * should apply.
There is no Disallow: / or any wildcard rule that matches only /.
Can you confirm if this is intended behavior? Otherwise, it seems like a parsing bug or fallback handling issue.
Thanks!
Thanks for reporting!
I've just tried to reproduce this but it's giving the expected result. Could you see if I'm doing anything differently:
const contents = '...robots.txt contents...';
const userAgent = 'SomeBot';
const robots = robotsParser('https://www.natureetdecouvertes.com/robots.txt', contents);
const isAllowed = robots.isAllowed('https://www.natureetdecouvertes.com/', userAgent);
const isDisallowed = robots.isDisallowed('https://www.natureetdecouvertes.com/', userAgent);
console.log(`IsAllowed: ${isAllowed}`); // outputs true
console.log(`IsDisallowed: ${isDisallowed}`); // outputs false
Working demo: https://jsbin.com/yakeruduru/edit?html,console
Only thing I can think of, is the host and port of the URL being passed to robotsParser() the same as the ones being checked? If not, isAllowed() will return undefined as the robots.txt file won't apply to the given URL.