robotstxt icon indicating copy to clipboard operation
robotstxt copied to clipboard

List allow & disallow

Open TheUltimateCookie opened this issue 3 years ago • 2 comments

Is it currently possible to just list allow and disallow paths along with their user agent without specifying a particular user agent?

TheUltimateCookie avatar Nov 03 '22 10:11 TheUltimateCookie

Duplicates https://github.com/temoto/robotstxt/pull/26

Right now there is no public API to read parsed rules.

Please describe (best in pseudo-code) how you would use it.

temoto avatar Nov 03 '22 19:11 temoto

This is part of a large web scraping process. Some of our clients have large robots.txt with many paths disallowed so we needed to know that before scraping started and for other SEO activities

Example: https://plantx.com/robots.txt

TheUltimateCookie avatar Nov 04 '22 07:11 TheUltimateCookie