crawler icon indicating copy to clipboard operation
crawler copied to clipboard

Go process used to crawl websites

crawler

Build Status Go Report Card Maintainability

Crawler is a Go written program designed to crawl website

features

  • use tor SOCKS proxy to crawl hidden services
  • fast, built using valyala/fasthttp (up to 10x faster than net/http)
  • extract both absolute and relative URLs
  • use scalable messaging protocol (nats)

how it work

  • The Crawler process connect to a nats server (specified by env variable NATS_URI) and set-up a subscriber for message with tag todoSubject
  • When an URL is received the crawler start crawling
  • When crawling is done, the crawler will publish content to nats server with subject contentSubject and found urls with subject doneSubject