robotstxt
robotstxt copied to clipboard
List allow & disallow
Is it currently possible to just list allow and disallow paths along with their user agent without specifying a particular user agent?
Duplicates https://github.com/temoto/robotstxt/pull/26
Right now there is no public API to read parsed rules.
Please describe (best in pseudo-code) how you would use it.
This is part of a large web scraping process. Some of our clients have large robots.txt with many paths disallowed so we needed to know that before scraping started and for other SEO activities
Example: https://plantx.com/robots.txt