browser-agent icon indicating copy to clipboard operation
browser-agent copied to clipboard

Respect `robots.txt`

Open jimrandomh opened this issue 2 years ago • 1 comments

ML-based bots need to respect the norms as all the other bots on the web. That means providing an identifiable user agent, loading robots.txt, and avoiding requests to places that robots.txt bans it from.

jimrandomh avatar Mar 26 '23 22:03 jimrandomh

I'd accept a PR that sets a custom User-Agent and a CLI option to respect robots (enabled by default) 😁

m1guelpf avatar Mar 27 '23 17:03 m1guelpf