headless-chrome-crawler
headless-chrome-crawler copied to clipboard
You should be able to provides the robots.txt
What is the current behavior?
Today the project automatically resolves the robots.txt.
What is the expected behavior?
It would be useful to be able to provides the robots.txt instead to bypass the default behavior of resolving it automatically.
What is the motivation / use case for changing the behavior?
-
You may want to provides a different set of rules (let's say I'm the owner of the site and I want to check of the crawler would behave with a different robot.txt)
-
In a big distributed environment, maybe you want to resolve the robots.txt once and share it with all the workers