browser-agent
browser-agent copied to clipboard
Respect `robots.txt`
ML-based bots need to respect the norms as all the other bots on the web. That means providing an identifiable user agent, loading robots.txt, and avoiding requests to places that robots.txt bans it from.
I'd accept a PR that sets a custom User-Agent and a CLI option to respect robots (enabled by default) 😁