Respect `robots.txt`

Open jimrandomh opened this issue 2 years ago • 1 comments

ML-based bots need to respect the norms as all the other bots on the web. That means providing an identifiable user agent, loading robots.txt, and avoiding requests to places that robots.txt bans it from.

Mar 26 '23 22:03 jimrandomh

I'd accept a PR that sets a custom User-Agent and a CLI option to respect robots (enabled by default) 😁

Mar 27 '23 17:03 m1guelpf