robots-parser icon indicating copy to clipboard operation
robots-parser copied to clipboard

Add explicit disallow feature

Open SimonC-Audigent opened this issue 1 year ago • 2 comments

There are some scenarios, where we want to check if the robots.txt is explicitly disallowing the UA (wildcard not included). This a common behaviour for ads crawlers such as Google AdsBots: https://developers.google.com/search/docs/crawling-indexing/robots/create-robots-txt, which will disregards wildcards. To support that scenario, I added an explicit parameter, to the isDisallowed method.

SimonC-Audigent avatar Sep 17 '24 11:09 SimonC-Audigent

Thanks for the PR! That would be a useful feature to add.

Instead of adding it as a boolean, it might be better to make it a separate method instead. For example, the difference between:

if (robots.isDisallowed(url, 'my-ua', true)) {

and:

if (robots.isDisallowed(url, 'my-ua', false)) {

would require checking the documentation but something like:

if (robots.isExplicitlyDisallowed(url, 'my-ua')) {

is a little more obvious.

Also adding a parameter is not guaranteed to be backwards compatible. It's unlikely to break anything but there's always a possibility someone has done something like:

let results = arrayOfUserAgents.map(robots.isAllowed.bind(robots, 'some ur'))

which would be broken if added due to map passing the index as a parameter.

samclarke avatar Sep 20 '24 15:09 samclarke

I refactored it to use isExplicitlyDisallowed and keep isDisallowed with the same signature, (which is what i implemented at the beginning and change my mind... but forgot to remove it from the PR).

SimonC-Audigent avatar Sep 23 '24 11:09 SimonC-Audigent