robots-txt topic
useful-links
List of useful links, tools and resources
nuxt-humans-txt
🧑🏻👩🏻 "We are people, not machines" - An initiative to know the creators of a website. Contains the information about humans to the web building - A Nuxt Module to statically integrate and generate...
WebScraper
Python-based web crawling script with randomized intervals, user-agent rotation, and proxy server IP rotation to outsmart website bots and prevent blocking.
waybackrobots
Enumerate old versions of robots.txt paths using Wayback Machine for content discovery
astro-launchpad
An Astro project template for decent projects: auth, i18next, Bootstrap, sitemap, webworker, robots.txt, preact, react, endpoints, endpoint clients, OAuth, various Astro features and data loading prec...
ai-training-opt-out
Known tags and settings suggested to opt out of having your content used for AI training.
weboptout
Opt-Out tool to check Copyright reservations in a way that even machines can understand.
jsitemapgenerator
Java sitemap generator. This library generates a web sitemap, can ping Google, generate RSS feed, robots.txt and more with friendly, easy to use Java 8 functional style of programming
robots.txt
:robot: robots.txt as a service. Crawls robots.txt files, downloads and parses them to check rules through an API
robotify-netcore
Provides robots.txt middleware for .NET core